Re: [zfs-discuss] zfs on 32 bit?
So you better post the nice and clean zfs error message that you got on your screen, instead of posting about things that you might ignore. To give the correct information, leads to your correct solution. In your case possible, the patchlevel, or /format -e/ issue. Think about it ! milosz schrieb: yeah, i get a nice clean zfs error message about disk size limits when i try to add the disk. On Tue, Jun 16, 2009 at 4:26 PM, rolandno-re...@opensolaris.org wrote: the only problems i've run into are: slow (duh) and will not take disks that are bigger than 1tb do you think that 1tb limit is due to 32bit solaris ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Moutacim LACHHAB Service Engineer Software Technical Services Center Sun Microsystems Inc. Email moutacim.lach...@sun.com mailto:moutacim.lach...@sun.com +33(0)134030594 x31457 For knowledge and support: http://sunsolve.sun.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
roland wrote: so, we have a 128bit fs, but only support for 1tb on 32bit? i`d call that a bug, isn`t it ? is there a bugid for this? ;) Not a ZFS bug. IIRC, the story goes something like this: a SMI label only works to 1 TByte, so to use 1 TByte, you need an EFI label. For older x86 systems -- those which are 32-bit -- you probably have a BIOS which does not handle EFI labels. This will become increasingly irritating since 2 TByte disks are now hitting the store shelves, but it doesn't belong in a ZFS category. Slightly more complicated than that. 32 bit Solaris can use at most 2^31 as disk address; a disk block is 512bytes, so in total it can address 2^40 bytes. A SMI label found in Solaris 10 (update 8?) and OpenSolaris has been enhanced and can address 2TB but only on a 64 bit system. I'm not sure which system uses EFI. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
$ psrinfo -pv The physical processor has 1 virtual processor (0) x86 (CentaurHauls 6A9 family 6 model 10 step 9 clock 1200 MHz) VIA Esther processor 1200MHz Also, some of the very very small little PC units out there, those things called eePC ( or whatever ) are probably 32-bit only. It's true for most of the Intel Atom family (Zxxx and Nxxx but not the 230 and 330 as those are 64 bit) Those are new systems. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
Not a ZFS bug. [SMI vs EFI labels vs BIOS booting] and so also only a problem for disks that are members of the root pool. ie, I can have 1Tb disks as part of a non-bootable data pool, with EFI labels, on a 32-bit machine? No; the daddr_t is only 32 bits. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
casper@sun.com wrote: It's true for most of the Intel Atom family (Zxxx and Nxxx but not the 230 and 330 as those are 64 bit) Those are new systems. Casper ___ I've actually just started to build my home raid using the Atom 330 (D945GCLF2): Status of virtual processor 0 as of: 06/17/2009 16:25:55 on-line since 09/17/2008 14:32:04. The i386 processor operates at 1600 MHz, and has an i387 compatible floating point processor. Status of virtual processor 1 as of: 06/17/2009 16:25:55 on-line since 09/17/2008 14:32:24. The i386 processor operates at 1600 MHz, and has an i387 compatible floating point processor. Status of virtual processor 2 as of: 06/17/2009 16:25:55 on-line since 09/17/2008 14:32:24. The i386 processor operates at 1600 MHz, and has an i387 compatible floating point processor. Status of virtual processor 3 as of: 06/17/2009 16:25:55 on-line since 09/17/2008 14:32:26. The i386 processor operates at 1600 MHz, and has an i387 compatible floating point processor. and booted 64 bit just fine. (I thought uname -a showed that, but apparently it does not). The only annoyance is that the onboard ICH7 is the $27c0, and not the $27c1 (with ahci mode for hot-swapping). But I always were planning on adding a SATA PCI card since I need more than 2 HDDs. But to stay on-topic, it sounds like Richard Elling summed it up nicely, which is something Richard is really good at. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
Hello Richard, Monish Shah wrote: What about when the compression is performed in dedicated hardware? Shouldn't compression be on by default in that case? How do I put in an RFE for that? Is there a bugs.intel.com? :-) I may have misled you. I'm not asking for Intel to add hardware compression. Actually, we already have gzip compression boards that we have integrated into OpenSolaris / ZFS and they are also supported under NexentaStor. What I'm saying is that if such a card is installed, compression should be enabled by default. NB, Solaris already does this for encryption, which is often a more computationally intensive operation. Actually, compression is more compute intensive than symmetric encryption (such as AES). Public key encryption, on the other hand, is horrendously compute intensive, much more than compression or symmectric encryption. But, nobody uses public key encryption for bulk data encryption, so that doesn't apply. Your mileage may vary. You can always come up with compression algorithms that don't do a very good job of compressing, but which are light on CPU utilization. Monish I think the general cases are performed well by current hardware, and it is already multithreaded. The bigger issue is, as Bob notes, resource management. There is opportunity for people to work here, especially since the community has access to large amounts of varied hardware. Should we spin up a special interest group of some sort? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
Not a ZFS bug. IIRC, the story goes something like this: a SMI label only works to 1 TByte, so to use 1 TByte, you need an EFI label. For older x86 systems -- those which are 32-bit -- you probably have a BIOS which does not handle EFI labels. This will become increasingly irritating since 2 TByte disks are now hitting the store shelves, but it doesn't belong in a ZFS category. Hasn't the 1TB limit for SMI labels been fixed (= limit raised to 2TB) by PSARC/2008/336 Extended VTOC ? http://www.opensolaris.org/os/community/on/flag-days/pages/2008091102/ But there still is a 1TB limit for 32-bit kernel, the PSARC case includes this: The following functional limitations are applicable: * 32-bit kernel will not support disks 1 TB. ... Btw. on older Solaris releases the install media always booted into a 32-bit kernel, even on systems that are capable to run the 64-bit kernel. Seems to have been changed with the latest opensolaris releases and that PSARC case, so that 64-bit systems can install to a disk 1TB. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
casper@sun.com wrote: ie, I can have 1Tb disks as part of a non-bootable data pool, with EFI labels, on a 32-bit machine? No; the daddr_t is only 32 bits. This looks like a left over problem problem from former times when UFS was limited to 1 TB anyway. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
Not a ZFS bug. IIRC, the story goes something like this: a SMI label only works to 1 TByte, so to use 1 TByte, you need an EFI label. For older x86 systems -- those which are 32-bit -- you probably have a BIOS which does not handle EFI labels. This will become increasingly irritating since 2 TByte disks are now hitting the store shelves, but it doesn't belong in a ZFS category. Hasn't the 1TB limit for SMI labels been fixed (= limit raised to 2TB) by PSARC/2008/336 Extended VTOC ? http://www.opensolaris.org/os/community/on/flag-days/pages/2008091102/ But there still is a 1TB limit for 32-bit kernel, the PSARC case includes this: The following functional limitations are applicable: * 32-bit kernel will not support disks 1 TB. ... Btw. on older Solaris releases the install media always booted into a 32-bit kernel, even on systems that are capable to run the 64-bit kernel. Seems to have been changed with the latest opensolaris releases and that PSARC case, so that 64-bit systems can install to a disk 1TB. That was changed many builds ago. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving a disk between controllers
I had a system with it's boot drive attached to a backplane which worked fine. I tried moving that drive to the onboard controller and a few seconds into booting it would just reboot. In certain cases zfs is able to find the drive on the new physical device path (IIRC: the disk's devid didn't change and the new physical location of the disk is already present in the /etc/devices/devid_cache). But in most cases you have to boot from the installation media and zpool import -f rpool the pool, with the disk attached at the new physical device path, so that the new physical device path gets recorded in zpool's on-disk label. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
David Magda dma...@ee.ryerson.ca writes: On Tue, June 16, 2009 15:32, Kyle McDonald wrote: So the cache saves not only the time to access the disk but also the CPU time to decompress. Given this, I think it could be a big win. Unless you're in GIMP working on JPEGs, or doing some kind of MPEG video editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of which are probably some of the largest files in most people's homedirs nowadays. indeed. I think only programmers will see any substantial benefit from compression, since both the code itself and the object files generated are easily compressible. 1 GB of e-mail is a lot (probably my entire personal mail collection for a decade) and will compress well; 1 GB of audio files is nothing, and won't compress at all. Perhaps compressing /usr could be handy, but why bother enabling compression if the majority (by volume) of user data won't do anything but burn CPU? So the correct answer on whether compression should be enabled by default is it depends. (IMHO :) ) I'd be interested to see benchmarks on MySQL/PostgreSQL performance with compression enabled. my *guess* would be it isn't beneficial since they usually do small reads and writes, and there is little gain in reading 4 KiB instead of 8 KiB. what other uses cases can benefit from compression? -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
Erik Trimble wrote: Dennis is correct in that there are significant areas where 32-bit systems will remain the norm for some time to come. And choosing a 32-bit system in these areas is completely correct. That said, I think the issue is that (unlike Linux), Solaris is NOT a super-duper-plays-in-all-possible-spaces OS. It's a specialized OS, intended for specific market segments. Embedded is not really one of them; nor are ultra-low-end netbooks or appliances such as set-top boxes (though, with the increasing functionality of DVRs, that may change soon). http://opensolaris.org/os/project/osarm/ -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
On Wed, Jun 17, 2009 at 5:03 PM, Kjetil Torgrim Hommekjeti...@linpro.no wrote: indeed. I think only programmers will see any substantial benefit from compression, since both the code itself and the object files generated are easily compressible. Perhaps compressing /usr could be handy, but why bother enabling compression if the majority (by volume) of user data won't do anything but burn CPU? How do you define substantial? My opensolaris snv_111b installation has 1.47x compression ratio for /, with the default compression. It's well worthed for me. -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files
Le 16 juin 09 à 19:55, Jose Martins a écrit : Hello experts, IHAC that wants to put more than 250 Million files on a single mountpoint (in a directory tree with no more than 100 files on each directory). He wants to share such filesystem by NFS and mount it through many Linux Debian clients We are proposing a 7410 Openstore appliance... He is claiming that certain operations like find, even if taken from the Linux clients on such NFS mountpoint take significant more time than if such NFS share was provided by other NAS providers like NetApp... 10%, 100%, 1% or more ? Knowing magnitude helps diagnostics. What kind of pool is this ? This should be a read performance test : pool type and total disk rotation impacts the resulting performance. Can someone confirm if this is really a problem for ZFS filesystems?... Nope Is there any way to tune it?... We thank any input Best regards Jose smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
On Wed, Jun 17, 2009 at 5:03 PM, Kjetil Torgrim Hommekjeti...@linpro= .no wrote: indeed. =A0I think only programmers will see any substantial benefi= t from compression, since both the code itself and the object files generated are easily compressible. Perhaps compressing /usr could be handy, but why bother enabling compression if the majority (by volume) of user data won't do anything but burn CPU? How do you define substantial? My opensolaris snv_111b installation has 1.47x compression ratio for /, with the default compression. It's well worthed for me. Indeed; I've had a few systems with: UFS (boot env 1) UFS (boot env 2) swap lucreate couldn't fix everything in one (old UFS) partition because of dump and swap; with compression I can fit multiple environments (more than two). I still use disk swap because I have some bad experiences with ZFS swap. (ZFS appears to cache and that is very wrong) Now I use: rpool (using both the UFS partitions, now concatenated into one slice) and real swap. My ZFS/Solaris wish list is this: - when you convert from UFS to ZFS, zpool create fails and requires create if; I'd like zpool create about *all* errors, not just one so you know exactly what collateral damage you would do) has a UFS filesystem s2 overlaps s0 etc - zpool upgrade should fail if one of the available boot environments doesn't support the new version (or upgrade to the lowest supported zfs version) Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
Fajar A. Nugraha fa...@fajar.net writes: Kjetil Torgrim Homme wrote: indeed. I think only programmers will see any substantial benefit from compression, since both the code itself and the object files generated are easily compressible. Perhaps compressing /usr could be handy, but why bother enabling compression if the majority (by volume) of user data won't do anything but burn CPU? How do you define substantial? My opensolaris snv_111b installation has 1.47x compression ratio for /, with the default compression. It's well worthed for me. I don't really care if my / is 5 GB or 3 GB. how much faster is your system operating? what's the compression rate on your data areas? -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?
Ok, so you mean the comments are mostly FUD and bull shit? Because there are no bug reports from the whiners? Could this be the case? It is mostly FUD? Hmmm...? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
Unless you're in GIMP working on JPEGs, or doing some kind of MPEG video editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of which are probably some of the largest files in most people's homedirs nowadays. indeed. I think only programmers will see any substantial benefit from compression, since both the code itself and the object files generated are easily compressible. If we are talking about data on people's desktops and laptops, yes, it is not very common to see a lot of compressible data. There will be some other examples, such as desktops being used for engineering drawings. The CAD files do tend to be compressible and they tend to be big. In any case, the really interesting case for compression is for business data (databases, e-mail servers, etc.) which tends to be quite compressible. ... I'd be interested to see benchmarks on MySQL/PostgreSQL performance with compression enabled. my *guess* would be it isn't beneficial since they usually do small reads and writes, and there is little gain in reading 4 KiB instead of 8 KiB. OK, now you have switched from compressibility of data to performance advantage. As I said above, this kind of data usually compresses pretty well. I agree that for random reads, there wouldn't be any gain from compression. For random writes, in a copy-on-write file system, there might be gains, because the blocks may be arranged in sequential fashion anyway. We are in the process of doing some performance tests to prove or disprove this. Now, if you are using SSDs for this type of workload, I'm pretty sure that compression will help writes. The reason is that the flash translation layer in the SSD has to re-arrange the data and write it page by page. If there is less data to write, there will be fewer program operations. Given that write IOPS rating in an SSD is often much less than read IOPS, using compression to improve that will surely be of great value. At this point, this is educated guesswork. I'm going to see if I can get my hands on an SSD to prove this. Monish what other uses cases can benefit from compression? -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
thank you, caspar. to sum up here (seems to have been a lot of confusion in this thread): the efi vs. smi thing that richard and a few other people have talked about is not the issue at the heart of this. this: 32 bit Solaris can use at most 2^31 as disk address; a disk block is 512bytes, so in total it can address 2^40 bytes. A SMI label found in Solaris 10 (update 8?) and OpenSolaris has been enhanced and can address 2TB but only on a 64 bit system. is what the problem is. so 32-bit zfs cannot use disks larger than 1(.09951)tb regardless of whether it's for the root pool or not. let me repeat that i do not consider this a bug and do not want to see it fixed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
Monish Shah mon...@indranetworks.com writes: I'd be interested to see benchmarks on MySQL/PostgreSQL performance with compression enabled. my *guess* would be it isn't beneficial since they usually do small reads and writes, and there is little gain in reading 4 KiB instead of 8 KiB. OK, now you have switched from compressibility of data to performance advantage. As I said above, this kind of data usually compresses pretty well. the thread has been about I/O performance since the first response, as far as I can tell. I agree that for random reads, there wouldn't be any gain from compression. For random writes, in a copy-on-write file system, there might be gains, because the blocks may be arranged in sequential fashion anyway. We are in the process of doing some performance tests to prove or disprove this. Now, if you are using SSDs for this type of workload, I'm pretty sure that compression will help writes. The reason is that the flash translation layer in the SSD has to re-arrange the data and write it page by page. If there is less data to write, there will be fewer program operations. Given that write IOPS rating in an SSD is often much less than read IOPS, using compression to improve that will surely be of great value. not necessarily, since a partial SSD write is much more expensive than a full block write (128 KiB?). in a write intensive application, that won't be an issue since the data is flowing steadily, but for the right mix of random reads and writes, this may exacerbate the bottleneck. At this point, this is educated guesswork. I'm going to see if I can get my hands on an SSD to prove this. that'd be great! -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
32 bit Solaris can use at most 2^31 as disk address; a disk block is 512bytes, so in total it can address 2^40 bytes. A SMI label found in Solaris 10 (update 8?) and OpenSolaris has been enhanced and can address 2TB but only on a 64 bit system. is what the problem is. so 32-bit zfs cannot use disks larger than 1(.09951)tb regardless of whether it's for the root pool or not. I think this isn't a problem with the 32-bit zfs module, but with all of the 32-bit Solaris kernel. The daddr_t type is used in a *lot* of places, and is defined as a signed 32-bit integer (long) in the 32-bit kernel. It seems that there already are 64-bit disk address types defined, diskaddr_t and lldaddr_t (that could be used in the 32-bit kernel, too), but a lot of the existing kernel code doesn't use them. And redefining the existing daddr_t type to 64-bit long long for the 32-bit kernel won't work, because it would break binary compatibility. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?
On Wed, Jun 17, 2009 at 8:37 PM, Orvar Korvarno-re...@opensolaris.org wrote: Ok, so you mean the comments are mostly FUD and bull shit? Unless there is real step-by-step reproducible proof, then yes, it is completely useless waste of time and BS that I would not care at all, if I were you. -- Kind regards, BM Things, that are stupid at the beginning, rarely ends up wisely. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving a disk between controllers
On Tue, Jun 16, 2009 at 6:46 PM, T Johnson tjohnso...@gmail.com wrote: Is there a problem with moving drives from one controller to another that my googlefu is not turning up? I had a system with it's boot drive attached to a backplane which worked fine. I tried moving that drive to the onboard controller and a few seconds into booting it would just reboot. Performing a fresh install when started connected to the onboard controller would then cause later reboots to function fine. If I reversed the order and moved the drive back to the hotswap backplane, I would have the same reboot problem. Any clues? Thanks, TJ Moving normal drives in a pool around isn't a problem. If you move a boot drive, you need to update grub. That has nothing to do with ZFS though, that would occur on any OS I'm aware of using grub as a bootloader. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files
Jose, I hope our openstorage experts weigh in on 'is this a good idea', it sounds scary to me but I'm overly cautious anyway. I did want to raise the question of other client expectations for this opportunity, what are the intended data protection requirements, how will they backup and recover the files, do they intend to apply replication in support of disaster recovery plan and are the intended data protection schemes practical. The other area that jumps out at me is concurrent access, in addition to the 'find' by 'many' clients, does the client have any performance requirements that must be met to insure the solution is successful. Does any of the above have to happen at the same time ? I'm not in a position to evaluate these considerations for this opportunity, simply sharing some areas that, often enough, are over looked as we address the chief complaint. Regards, Robert Jose Martins wrote: Hello experts, IHAC that wants to put more than 250 Million files on a single mountpoint (in a directory tree with no more than 100 files on each directory). He wants to share such filesystem by NFS and mount it through many Linux Debian clients We are proposing a 7410 Openstore appliance... He is claiming that certain operations like find, even if taken from the Linux clients on such NFS mountpoint take significant more time than if such NFS share was provided by other NAS providers like NetApp... Can someone confirm if this is really a problem for ZFS filesystems?... Is there any way to tune it?... We thank any input Best regards Jose -- Robert C. Ungar ABCP Professional Services Delivery Storage Solutions Specialist Telephone 585-598-9020 Sun Microsystems 345 Woodcliff Drive Fairport, NY 14450 www.sun.com/storage ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?
On 17-Jun-09, at 7:37 AM, Orvar Korvar wrote: Ok, so you mean the comments are mostly FUD and bull shit? Because there are no bug reports from the whiners? Could this be the case? It is mostly FUD? Hmmm...? Having read the thread, I would say without a doubt. Slashdot was never the place to go for accurate information about ZFS. --Toby -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs list -t snapshots
cindy.swearin...@sun.com writes: [...] # zfs list -rt snapshot z3/www [...] Yeah... now were talking thanks I'm still a little curious though as to why `zfs list -t snapshot' By itself without a dataset, only lists snapshots under z3/www I understand about the `-r recursive' but would probably never have thought of it on my own, but why does z3/www list without -r and only -t as argument, and not one of the others? I'm pretty sure z3/www was the last one I created... if that matters. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs list -t snapshots
Harry Putnam rea...@newsguy.com writes: cindy.swearin...@sun.com writes: [...] # zfs list -rt snapshot z3/www [...] Yeah... now were talking thanks I'm still a little curious though as to why `zfs list -t snapshot' By itself without a dataset, only lists snapshots under z3/www I understand about the `-r recursive' but would probably never have thought of it on my own, but why does z3/www list without -r and only -t as argument, and not one of the others? I'm pretty sure z3/www was the last one I created... if that matters. Er... never mind... apparently I just never scrolled back far enough to see the rest of the output with `zfs list -t snapshot' (no other args), and just saw the part including z3/www which comes last. Looking again I see all the data sets get listed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files
On Wed, Jun 17 at 13:49, Alan Hargreaves wrote: Another question worth asking here is, is a find over the entire filesystem something that they would expect to be executed with sufficient regularity that it the execution time would have a business impact. Exactly. That's such an odd business workload on 250,000,000 files that there isn't likely to be much of a shortcut other than just throwing tons of spindles (or SSDs) at the problem, and/or having tons of memory. If the finds are just by name, thats easy for the system to cache, but if you're expecting to run something against the output of find with -exec to parse/process 250M files on a regular basis, you'll likely be severely IO bound. Almost to the point of arguing for something like Hadoop or another form of distributed map:reduce on your dataset with a lot of nodes, instead of a single storage server. -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files
Jose, I believe the problem is endemic to Solaris. I have run into similar problems doing a simple find(1) in /etc. On Linux, a find operation in /etc is almost instantaneous. On solaris, it has a tendency to spin for a long time. I don't know what their use of find might be but, running updatedb on the linux clients (with the NFS file system mounted of course) and using locate(1) will give you a work-around on the linux clients. Caveat Empore: There is a staleness factor associated with this solution as any new files dropped in after updatedb runs will not show up until the next updatedb is run. HTH louis On 06/16/09 11:55, Jose Martins wrote: Hello experts, IHAC that wants to put more than 250 Million files on a single mountpoint (in a directory tree with no more than 100 files on each directory). He wants to share such filesystem by NFS and mount it through many Linux Debian clients We are proposing a 7410 Openstore appliance... He is claiming that certain operations like find, even if taken from the Linux clients on such NFS mountpoint take significant more time than if such NFS share was provided by other NAS providers like NetApp... Can someone confirm if this is really a problem for ZFS filesystems?... Is there any way to tune it?... We thank any input Best regards Jose ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
On Wed, June 17, 2009 06:15, Fajar A. Nugraha wrote: Perhaps compressing /usr could be handy, but why bother enabling compression if the majority (by volume) of user data won't do anything but burn CPU? How do you define substantial? My opensolaris snv_111b installation has 1.47x compression ratio for /, with the default compression. It's well worthed for me. And how many GB is that? ~1.5x is quite good, but if you're talking about a 7.5 GB install using only 3 GB of space, but your homedir is 50 GB, it's not a lot in relative terms. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files
Hi Louis! Solaris /usr/bin/find and Linux (GNU-) find work differently! I have experienced dramatic runtime differences some time ago. The reason is that Solaris find and GNU find use different algorithms. GNU find uses the st_nlink (number of links) field of the stat structure to optimize it's work. Solaris find does not use this kind of optimization because the meaning of number of links is not well defined and file system dependent. If you are interested, take a look at, say, CR 4907267 link count problem is hsfs CR 4462534 RFE: pcfs should emulate link counts for directories Dirk Am 17.06.2009 um 18:08 schrieb Louis Romero: Jose, I believe the problem is endemic to Solaris. I have run into similar problems doing a simple find(1) in /etc. On Linux, a find operation in /etc is almost instantaneous. On solaris, it has a tendency to spin for a long time. I don't know what their use of find might be but, running updatedb on the linux clients (with the NFS file system mounted of course) and using locate(1) will give you a work-around on the linux clients. Caveat Empore: There is a staleness factor associated with this solution as any new files dropped in after updatedb runs will not show up until the next updatedb is run. HTH louis On 06/16/09 11:55, Jose Martins wrote: Hello experts, IHAC that wants to put more than 250 Million files on a single mountpoint (in a directory tree with no more than 100 files on each directory). He wants to share such filesystem by NFS and mount it through many Linux Debian clients We are proposing a 7410 Openstore appliance... He is claiming that certain operations like find, even if taken from the Linux clients on such NFS mountpoint take significant more time than if such NFS share was provided by other NAS providers like NetApp... Can someone confirm if this is really a problem for ZFS filesystems?... Is there any way to tune it?... We thank any input Best regards Jose -- Sun Microsystems GmbH Dirk Nitschke Nagelsweg 55 Storage Architect 20097 Hamburg Phone: +49-40-251523-413 Germany Fax: +49-40-251523-425 http://www.sun.de/Mobile: +49-172-847 62 66 dirk.nitsc...@sun.com --- Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten - Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
Solaris is NOT a super-duper-plays-in-all-possible-spaces OS. yes, i know - but it`s disappointing that not even 32bit and 64bit x86 hardware is handled the same. 1TB limit on 32bit, less stable on 32bit. sorry, but if you are used to linux, solaris is really weird. issue here, limitation there doh! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] space in use by snapshots
hello, i`m doing backups to several backup-dirs where each is a sub-filesystem on /zfs, i.e. /zfs/backup1 , /zfs/backup2 i do snapshots on daily base, but have a problem: how can i see, how much space is in use by the snapshots for each sub-fs, i.e. i want to see what`s being in use on /zfs/backup1 (that`s easy, just du -s -h /zfs/backup1) and how much space do the snapshots need (that seems not so easy) thanks roland -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files
Hi Louis! Solaris /usr/bin/find and Linux (GNU-) find work differently! I have experienced dramatic runtime differences some time ago. The reason is that Solaris find and GNU find use different algorithms. GNU find uses the st_nlink (number of links) field of the stat structure to optimize it's work. Solaris find does not use this kind of optimization because the meaning of number of links is not well defined and file system dependent. But that's not the under discussion: apparently the *same* clients get different performance from a OpenStorage system vs a Netapp system. I think we need to know much more and I think OpenStorage can giv the information you need. (Yes, I did have problems because of GNU finds shortcuts: they don't work all the time) Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
roland wrote: Solaris is NOT a super-duper-plays-in-all-possible-spaces OS. yes, i know - but it`s disappointing that not even 32bit and 64bit x86 hardware is handled the same. 1TB limit on 32bit, less stable on 32bit. sorry, but if you are used to linux, solaris is really weird. issue here, limitation there doh! Which is true, but so is it for Linux - you're just familiar with Linux's limitations, so they don't seem like limitations anymore. E.g.: linux handles 32-bit programs under 64-bit Linux is a much less clean way than Solaris does. Also, 2.4 (x86) kernels have a 1TB block device size limit, while 2.6 Linux x86 is limited to 16TB block devices. Solaris handles many more (i.e. maximum number of) CPUs than even the latest Linux. It's a transitional adjustment - you don't expect a Semi to drive the same is Hummer to drive the same as a Porche, do you? Each OS has its strengths and weaknesses; pick your poison. It's actually NOT a good idea for all OSes to have the same feature set. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files
Dirk Nitschke dirk.nitsc...@sun.com wrote: Solaris /usr/bin/find and Linux (GNU-) find work differently! I have experienced dramatic runtime differences some time ago. The reason is that Solaris find and GNU find use different algorithms. Correct: Solaris find honors the POSIX standard, GNU find does not :-( GNU find uses the st_nlink (number of links) field of the stat structure to optimize it's work. Solaris find does not use this kind of optimization because the meaning of number of links is not well defined and file system dependent. GNU find makes illegal assumptions on the value in st_nlink for diretctories. These assumptions are derived from implementation specifics found in historic UNIX filesystem implementations, but there is no grant for the asumed behavior. If you are interested, take a look at, say, CR 4907267 link count problem is hsfs Hsfs just shows you the numbers set up by the ISO-9660 filesystem creation utility. If you use a recent original mkisofs (like the one that come with Solaris since 1.5 years), you get the same behavior for hsfs and UFS. The related feature has been implemented in October 2006 in mkisofs. If you use mkisofs from one of the non-OSS-friendly Linux distributions like Debian, RedHat, Suse, Ubuntu or Mandriva, you use a 5 year old mkisofs version and thus the values in st_nlink for directories are random numbers. The problems caused by programs that ignore POSIX rules have been discussed several times in the POSIX mailing list. In order to solve the issue, I did propose several times to introduce a new pathconf() call that allows to ask whether a directory has historic UFS semantics for st_nlink. Hsfs knows whether the filesystem was created by a recent mkisofs and thus would be able to give the right return value. NFS clients need to implement a RPC that allows to retrieve the value from the expoirted filesystem at the server side. Am 17.06.2009 um 18:08 schrieb Louis Romero: Jose, I believe the problem is endemic to Solaris. I have run into similar problems doing a simple find(1) in /etc. On Linux, a find operation in /etc is almost instantaneous. On solaris, it has a If you ignore standards you may get _apparent_ speed. On Linux this speed is usually bought by giving up correctness. tendency to spin for a long time. I don't know what their use of find might be but, running updatedb on the linux clients (with the NFS file system mounted of course) and using locate(1) will give you a work-around on the linux clients. With NFS, things are even more complex and in principle similar to the hsfs case where the OS filesystem implementation just shows you the values set up by mkisofs. On a NFS client, you see the number that have been set up by the NFS server but you don't know what filesystem type is under the NFS server. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] space in use by snapshots
Hi Roland, Current Solaris releases, SXCE (build 98) or OpenSolaris 2009.06, provide space accounting features to display space consumed by snapshots, descendent datasets, and so on. On my OSOL 2009.06 system with automatic snapshots running, I can see the space that is consumed by snapshots by using this syntax. # zfs list -o space The output doesn't format well in email so I didn't include it. To see descendent datasets space consumption, use this syntax: # zfs list -ro space rpool/export/home These features are described here: http://docs.sun.com/app/docs/doc/817-2271/ghbxt?l=ena=view See the Space accounting properties section. Cindy roland wrote: hello, i`m doing backups to several backup-dirs where each is a sub-filesystem on /zfs, i.e. /zfs/backup1 , /zfs/backup2 i do snapshots on daily base, but have a problem: how can i see, how much space is in use by the snapshots for each sub-fs, i.e. i want to see what`s being in use on /zfs/backup1 (that`s easy, just du -s -h /zfs/backup1) and how much space do the snapshots need (that seems not so easy) thanks roland ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Checksum errors
pool: space01 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress, 2.48% done, 4h18m to go config: NAME STATE READ WRITE CKSUM space01 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 2 errors: No known data errors The last drive shows two checksum errors, but iostat(1M) shows no hardware errors on that disk: iostat -Ene | grep Hard | grep c1t11d0 c1t11d0 Soft Errors: 178 Hard Errors: 0 Transport Errors: 0 I'm not sure what I need to do, respectively how else I can determine if the device needs replaced. Do I perform zpool clear, or do I need to replace c1t11d0, or do I rerun scrub? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Checksum errors
Hi UNIX admin, I would check fmdump -eV output to see if this error is isolated or persistent. If fmdump says this error is isolated, then you might just monitor the status. For example, if fmdump says that these errors occurred on 6/15 and you moved this system on that date or you know that someone shouted at c1t11d0 on that date, then those events might explain this issue and you can use zpool clear to clear the error state. If fmdump says the c1t11d0 error persists over a period of time, then I could consider replacing this device. You can review more diagnostic tips here: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Resolving_Hardware_Problems Cindy UNIX admin wrote: pool: space01 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress, 2.48% done, 4h18m to go config: NAME STATE READ WRITE CKSUM space01 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t11d0 ONLINE 0 0 2 errors: No known data errors The last drive shows two checksum errors, but iostat(1M) shows no hardware errors on that disk: iostat -Ene | grep Hard | grep c1t11d0 c1t11d0 Soft Errors: 178 Hard Errors: 0 Transport Errors: 0 I'm not sure what I need to do, respectively how else I can determine if the device needs replaced. Do I perform zpool clear, or do I need to replace c1t11d0, or do I rerun scrub? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
djm == Darren J Moffat darr...@opensolaris.org writes: cd == Casper Dik casper@sun.com writes: djm http://opensolaris.org/os/project/osarm/ yeah. many of those ARM systems will be low-power builtin-crypto-accel builtin-gigabit-MAC based on Orion and similar, NAS (NSLU2-ish) things begging for ZFS. cd It's true for most of the Intel Atom family (Zxxx and Nxxx but cd not the 230 and 330 as those are 64 bit) Those are new cd systems. the 64-bit atom are desktop, and the 32-bit are laptop. They are both current chips right now---the 64-bit are not newer than 32-bit. pgpsxIQ4TGzCB.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?
bmm == Bogdan M Maryniuk bogdan.maryn...@gmail.com writes: tt == Toby Thain t...@telegraphics.com.au writes: ok == Orvar Korvar no-re...@opensolaris.org writes: bmm Personally I am running various open solaris versions on a bmm VirtualBox as a crash dummy, as well as running osol on a real bmm systems. All ZFS. It sounds like what you're doing to ``reproduce'' the problem is: to use solaris. This is about what I imagined, and isn't what I had in mind as adequate. bmm Please give here a clear steps that fails for you, steps given on this list were: 1. use iSCSI or FCP as a vdev. 2. reboot the target but do not reboot the ZFS initiator. bmm provide some dtrace output etc. haha, ``hello, slashpot? This is slashkettle.'' It's just funny that after watching others on this list (sometimes with success!) debug their corrupt filesystems, the tool you latched onto was dtrace, and not mdb or zdb which do not appear in Sun marketing nearly so often. bmm Unless there is real step-by-step reproducible proof, corruption problems with other filesystems generally do not work this way, though we can try to get closer to it. What's more common is to pass around an image of the corrupt filesystem. Surely you can understand there is such thing as a ``hard to reproduce problem?'' Is the phrase so new to you? If you'd experience with other filesystems in their corruption-prone infancy, it wouldn't be. ok Ok, so you mean the comments are mostly FUD and bull shit? ok Because there are no bug reports from the whiners? Access to the bug database is controlled. Access to the mailing list is not. The posters did point to reports on the mailing list. tt Slashdot was never the place to go for accurate information tt about ZFS. again, the posts in the slashdot thread complaining about corruption were just pointers to original posts on this list, so attacking the forum where you saw the pointer instead of the content of its destination really is clearly _ad hominem_. pgpcSK6FkkCsN.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] space in use by snapshots
great, will try it tomorrow! thanks very much! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
Bob Friesenhahn wrote: On Mon, 15 Jun 2009, Bob Friesenhahn wrote: On Mon, 15 Jun 2009, Rich Teer wrote: You actually have that backwards. :-) In most cases, compression is very desirable. Performance studies have shown that today's CPUs can compress data faster than it takes for the uncompressed data to be read or written. Do you have a reference for such an analysis based on ZFS? I would be interested in linear read/write performance rather than random access synchronous access. Perhaps you are going to make me test this for myself. Ok, I tested this for myself on a Solaris 10 system with 4 3GHz AMD64 cores and see that we were both right. I did an iozone run with compression and do see a performance improvement. I don't know what the data iozone produces looks like, but it clearly must be quite compressable. Testing was done with a 64GB file: KB reclen write rewrite read reread uncompressed: 67108864 128 359965 354854 550869 554271 lzjb: 67108864 128 851336 924881 1289059 1362625 Unfortunately, during the benchmark run with lzjb the system desktop was essentially unusable with misbehaving mouse and keyboard as well as reported 55% CPU consumption. Without the compression the system is fully usable with very little CPU consumed. If the system is dedicated to serving files rather than also being used interactively, it should not matter much what the CPU usage is. CPU cycles can't be stored for later use. Ultimately, it (mostly*) does not matter if one option consumes more CPU resources than another if those CPU resources were otherwise going to go unused. Changes (increases) in latencies are a consideration but probably depend more on process scheduler choice and policies. *Higher CPU usage will increase energy consumption, heat output, and cooling costs...these may be important considerations in some specialized dedicated file server applications, depending on operational considerations. The interactivity hit may pose a greater challenge for any other processes/databases/virtual machines run on hardware that also serves files. The interactivity hit may also be evidence that the process scheduler is not fairly or effectively sharing CPU resources amongst the running processes. If scheduler tweaks aren't effective, perhaps dedicating a processor core(s) to interactive GUI stuff and the other cores to filesystem duties would help smooth things out. Maybe zones be used for that? With a slower disk subsystem the CPU overhead would surely be less since writing is still throttled by the disk. It would be better to test with real data rather than iozone. There are 4 sets of articles with links and snippets from their test data below. Follow the links for the full discussion: First article: http://blogs.sun.com/dap/entry/zfs_compression#comments Hardware: Sun Storage 7000 # The server is a quad-core 7410 with 1 JBOD (configured with mirrored storage) and 16GB of RAM. No SSD. # The client machine is a quad-core 7410 with 128GB of DRAM. Summary: text data set Compression Ratio Total Write Read off 1.00x 3:30 2:08 1:22 lzjb 1.47x 3:26 2:04 1:22 gzip-2 2.35x 6:12 4:50 1:22 gzip 2.52x 11:18 9:56 1:22 gzip-9 2.52x 12:16 10:54 1:22 Summary: media data set Compression Ratio Total Write Read off 1.00x 3:29 2:07 1:22 lzjb 1.00x 3:31 2:09 1:22 gzip-2 1.01x 6:59 5:37 1:22 gzip 1.01x 7:18 5:57 1:21 gzip-9 1.01x 7:37 6:15 1:22 Second article/discussion: http://ekschi.com/technology/2009/04/28/zfs-compression-a-win-win/ http://blogs.sun.com/observatory/entry/zfs_compression_a_win_win Third article summary: ZFS and MySQL/InnoDB shows that gzip is often cpu-bound on current processors; lzjb improves performance. http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/ Hardware: SunFire X2200 M2 w/64GB of RAM and 2 x dual-core 2.6GHz Opterons Dell MD3000 w/15 x 15K SCSI disks and mirrored 512MB battery-backed write caches "Also note that this is writing to two DAS enclosures with 15 x 15K SCSI disks apiece (28 spindles in a striped+mirrored configuration) with 512MB of write cache apiece." TABLE1 compression size ratio time uncompressed 172M 1 0.207s lzjb 79M 2.18X 0.234s gzip-1 50M 3.44X 0.24s gzip-9 46M 3.73X 0.217s
Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?
On 17-Jun-09, at 5:42 PM, Miles Nordin wrote: bmm == Bogdan M Maryniuk bogdan.maryn...@gmail.com writes: tt == Toby Thain t...@telegraphics.com.au writes: ok == Orvar Korvar no-re...@opensolaris.org writes: tt Slashdot was never the place to go for accurate information tt about ZFS. again, the posts in the slashdot thread complaining about corruption were just pointers to original posts on this list, so attacking the forum where you saw the pointer instead of the content of its destination really is clearly _ad hominem_. Ad foruminem? !! Or did you simply mean, uncalled-for? /. is no person... And most of the thread really was rubbish. If one or two posts linked to the mailing list, that doesn't change it. --Toby ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
David Magda wrote: On Tue, June 16, 2009 15:32, Kyle McDonald wrote: So the cache saves not only the time to access the disk but also the CPU time to decompress. Given this, I think it could be a big win. Unless you're in GIMP working on JPEGs, or doing some kind of MPEG video editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of which are probably some of the largest files in most people's homedirs nowadays. 1 GB of e-mail is a lot (probably my entire personal mail collection for a decade) and will compress well; 1 GB of audio files is nothing, and won't compress at all. Perhaps compressing /usr could be handy, but why bother enabling compression if the majority (by volume) of user data won't do anything but burn CPU? So the correct answer on whether compression should be enabled by default is it depends. (IMHO :) ) The performance tests I've found almost universally show LZJB as not being cpu-bound on recent equipment. A few years from now GZIP may get away from being cpu-bound. As performance tests on current hardware show that enabling LZJB improves overall performance it would make sense to enable it by default. In the future when GZIP is no longer cpu-bound, it might become the default (or there could be another algorithm). There is a long history of previously formidable tasks starting out as cpu-bound but quickly progressing to an 'easily handled in the background' task. Decoding MP3 and MPEG1, MPEG2 (DVD resolutions), softmodems (and other host signal processor devices), and RAID are all tasks that can easily be handled by recent equipment. Another option/idea to consider is using LZJB as the default compression method, and then performing a background scrub-recompress during otherwise idle times. Technique ideas: 1.) A performance neutral/performance enhancing technique: use any algorithm that is not CPU bound on your hardware, and rarely if ever has worse performance than the uncompressed state 2.) Adaptive technique 1: rarely used blocks could be given the strongest compression (using an algorithm tuned for the data type detected), while frequently used blocks would be compressed at a performance neutral or performance improving levels. 3.) Adaptive technique 2: rarely used blocks could be given the strongest compression (using an algorithm tuned for the data type detected), while frequently used blocks would be compressed at a performance neutral or performance improving levels. As the storage device gets closer to its native capacity, start applying compression both proactively (to new data) and retroactively (to old data), progressively using more powerful compression techniques as the maximum native capacity is approached. Compression could delay the users from reaching the 80-95% capacity point where system performance curves often have their knees (a massive performance degradation with each additional unit). 4.) Maximize space technique: detect the data type and use the best available algorithm for the block. As a counterpoint, if drive capacities keep growing at their current pace it seems they ultimately risk obviating the need to give much thought to the compression algorithm, except to choose one that boosts system performance. (I.e. in hard drives, compression may primarily be used to improve performance rather than gain extra storage space, as drive capacity has grown many times faster than drive performance.) JPEGs often CAN be /losslessly/ compressed further by useful amounts (e.g. 25% space savings). There is more on this here: Tests: http://www.maximumcompression.com/data/jpg.php http://compression.ca/act/act-jpeg.html http://www.downloadsquad.com/2008/09/11/winzip-12-supports-lossless-jpg-compression/ http://download.cnet.com/8301-2007_4-10038172-12.html http://www.online-tech-tips.com/software-reviews/winzip-vs-7-zip-best-compression-method/ These have source code available: http://sylvana.net/jpeg-ari/ PAQ8R http://www.cs.fit.edu/~mmahoney/compression/ (general info http://en.wikipedia.org/wiki/PAQ ) This one says source code is not yet available (implying it may become available): http://www.elektronik.htw-aalen.de/packjpg/packjpg_m.htm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Linux and OS 2009
I have Ubuntu jaunty already installed on my pc, on the second HD, i've installed OS2009 Now, i cant share info between this 2 OS. I download and install ZFS-FUSE on jaunty, but the version is 6, instead in OS209 the ZFS version is 14 or something else. off course, thera are different versions. How can i share info between this 2 OS? ¡Obtén la mejor experiencia en la web! Descarga gratis el nuevo Internet Explorer 8. http://downloads.yahoo.com/ieak8/?l=e1___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?
On Thu, Jun 18, 2009 at 6:42 AM, Miles Nordincar...@ivy.net wrote: Surely you can understand there is such thing as a ``hard to reproduce problem?'' Is the phrase so new to you? If you'd experience with other filesystems in their corruption-prone infancy, it wouldn't be. I understand your point, but I don't understand what you're trying to achieve this way? Of course, not everything that you can do you should do (like your target rebooting etc) and of course it helps, once reproducible. The same way, if you have a mirror of USB hard drives, then swap cables and reboot — your mirror gone. But that's not because of ZFS, if you will look more closely... That's why I think that speaking My $foo crashes therefore it is all crap is bad idea: either help to fix it or just don't use it, thus fcsk and lost+found are your friends on ext3 with corrupted superblock after yet another Linux kernel panic. :-) JFYI: *all* filesystems crashes and loses their data for time to time. That's what backups are for. Hence if you use your backup quite often, then you can find the problem and report here. That would be very appreciated and helpful. Thanks! -- Kind regards, BM Things, that are stupid at the beginning, rarely ends up wisely. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?
On Thu 18/06/09 09:42 , Miles Nordin car...@ivy.net sent: Access to the bug database is controlled. No, the bug databse is open. Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
On Wed, 17 Jun 2009, Haudy Kazemi wrote: usable with very little CPU consumed. If the system is dedicated to serving files rather than also being used interactively, it should not matter much what the CPU usage is. CPU cycles can't be stored for later use. Ultimately, it (mostly*) does not matter if Clearly you have not heard of the software flywheel: http://www.simplesystems.org/users/bfriesen/software_flywheel.html If I understand the blog entry correctly, for text data the task took up to 3.5X longer to complete, and for media data, the task took about 2.2X longer to complete with a maximum storage compression ratio of 2.52X. For my backup drive using lzjb compression I see a compression ratio of only 1.53x. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Server Cloning With ZFS?
So I had an E450 running Solaris 8 with VxVM encapsulated root disk. I upgraded it to Solaris 10 ZFS root using this method: - Unencapsulate the root disk - Remove VxVM components from the second disk - Live Upgrade from 8 to 10 on the now-unused second disk - Boot to the new Solaris 10 install - Create a ZFS pool on the now-unused first disk - Use Live Upgrade to migrate root filesystems to the ZFS pool - Add the now-unused second disk to the ZFS pool as a mirror Now my E450 is running Solaris 10 5/09 with ZFS root, and all the same users, software, and configuration that it had previously. That is pretty slick in itself. But the server itself is dog slow and more than half the disks are failing, and maybe I want to clone the server on new(er) hardware. With ZFS, this should be a lot simpler than it used to be, right? A new server has new hardware, new disks with different names and different sizes. But that doesn't matter anymore. There's a procedure in the ZFS manual to recover a corrupted server by using zfs receive to reinstall a copy of the boot environment into a newly created pool on the same server. But what if I used zfs send to save a recursive snapshot of my root pool on the old server, booted my new server (with the same architecture) from the DVD in single user mode and created a ZFS pool on its local disks, and did zfs receive to install the boot environments there? The filesystems don't care about the underlying disks. The pool hides the disk specifics. There's no vfstab to edit. Off the top of my head, all I can think to have to change is the network interfaces. And that change is as simple as cd /etc ; mv hostname.hme0 hostname.qfe0 or whatever. Is there anything else I'm not thinking of? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] problems with l2arc in 2009.06
Hi all, Since we've started running 2009.06 on a few servers we seem to be hitting a problem with l2arc that causes it to stop receiving evicted arc pages. Has anyone else seen this kind of problem? The filesystem contains about 130G of compressed (lzjb) data, and looks like: $ zpool status -v data pool: data state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t1d0p0 ONLINE 0 0 0 c1t9d0p0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0p0 ONLINE 0 0 0 c1t10d0p0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t3d0p0 ONLINE 0 0 0 c1t11d0p0 ONLINE 0 0 0 logs ONLINE 0 0 0 c1t7d0p0 ONLINE 0 0 0 c1t15d0p0ONLINE 0 0 0 cache c1t14d0p0ONLINE 0 0 0 c1t6d0p0 ONLINE 0 0 0 $ zpool iostat -v data capacity operationsbandwidth poolused avail read write read write - - - - - - - data133G 275G334926 2.35M 8.62M mirror 44.4G 91.6G111257 799K 1.60M c1t1d0p0 - - 55145 979K 1.61M c1t9d0p0 - - 54145 970K 1.61M mirror 44.3G 91.7G111258 804K 1.61M c1t2d0p0 - - 55140 979K 1.61M c1t10d0p0 - - 55140 973K 1.61M mirror 44.4G 91.6G111258 801K 1.61M c1t3d0p0 - - 55145 982K 1.61M c1t11d0p0 - - 55145 975K 1.61M c1t7d0p0 12K 29.7G 0 76 71 1.90M c1t15d0p0 152K 29.7G 0 78 11 1.96M cache - - - - - - c1t14d0p051.3G 23.2G 51 35 835K 4.07M c1t6d0p0 48.7G 25.9G 45 34 750K 3.86M - - - - - - - After adding quite a bit of data to l2arc, it quits getting new writes, and read traffic is quite low, even though arc misses are quite high: capacity operationsbandwidth poolused avail read write read write - - - - - - - data133G 275G550263 3.85M 1.57M mirror 44.4G 91.6G180 0 1.18M 0 c1t1d0p0 - - 88 0 3.22M 0 c1t9d0p0 - - 91 0 3.36M 0 mirror 44.3G 91.7G196 0 1.29M 0 c1t2d0p0 - - 95 0 2.74M 0 c1t10d0p0 - -100 0 3.60M 0 mirror 44.4G 91.6G174 0 1.38M 0 c1t3d0p0 - - 85 0 2.71M 0 c1t11d0p0 - - 88 0 3.34M 0 c1t7d0p08K 29.7G 0131 0 790K c1t15d0p0 156K 29.7G 0131 0 816K cache - - - - - - c1t14d0p051.3G 23.2G 16 0 271K 0 c1t6d0p0 48.7G 25.9G 14 0 224K 0 - - - - - - - $ perl arcstat.pl Time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c 21:21:31 10M5M 535M 53 002M 31 857M 1G 21:21:32 20984 4084 40 0060 32 833M 1G 21:21:33 25557 2257 22 00 94 832M 1G 21:21:34 630 483 76 483 76 00 232 63 831M 1G Arcstats output, just for completeness: $ kstat -n arcstats module: zfs instance: 0 name: arcstatsclass:misc c 1610325248 c_max 2147483648 c_min 1610325248 crtime 129.137246015 data_size 528762880 deleted 14452910 demand_data_hits589823 demand_data_misses 3812972 demand_metadata_hits4477921 demand_metadata_misses 2069450 evict_skip 5347558 hash_chain_max 13 hash_chains 521232 hash_collisions 9991276 hash_elements 1750708 hash_elements_max 2627838 hdr_size25463208 hits5067744 l2_abort_lowmem 3225 l2_cksum_bad0
Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?
The way I see it is that eventhough ZFS may be a wonderful filesystem, it is not the best solution for every possible (odd) setup. I.e USB-sticks has proven a bad idea with zfs mirrors, ergo - dont do it(tm). ZFS on iSCSI *is* flaky and a host-reboot without telling the target will most likely get the target to panic and/or crash in a couple of fun ways. Yes I have tested, way to many times, and yes I know this - therefore I do not reboot my hosts if not absolutely necessary and if I have to, I usually tell my target(s) to unmount/export/calm down before I do it. The same is true for FC-connected vdevs, and thus - I dont reboot my SAN nor my FC connected cabinets without telling the targets. I know that sometimes you cant avoid theese situations but then I've learned how to fix it when it happends, and there's no bs, just fix it. I've been equally disappointed and encouraged by ZFS itself, and i've lost data, i've recovered data (thanks to Victor Latushkin) and im ok with it. Why? Know thy filesystem! I know what I can do, what I can trust and what I cant, so I do and I dont accordingly. Flaming people on ./ is *not* the way to silence those who lost data due to corruptions, telling them that Ok, that's sad, but perhaps ZFS is NOT for you if you want to run USB-connected iscsi-initiator over zfs raidz with mushrooms and sauce. would probably be a better idea. Nor is blaming *all* errors/corruptiosn that actually occur on bad hardware (ECC discussion), stupid programs (mostly GNU-hate) or stupid administrators who didnt get the budget to run backups on 100TB+ nodes etc etc. Now I'll probably get flamed by those who get religious over theese kind of discussions, but that's ok, I dont get hurt by flames on internet... Arguing on the internet... special olympics yada yada.. ;-) Best Regards, Timh 2009/6/18 Ian Collins masuma@quicksilver.net.nz: On Thu 18/06/09 09:42 , Miles Nordin car...@ivy.net sent: Access to the bug database is controlled. No, the bug databse is open. Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Timh Bergström System Operations Manager Diino AB - www.diino.com :wq ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?
Den 18 juni 2009 06.47 skrev Timh Bergströmtimh.bergst...@diino.net: The way I see it is that eventhough ZFS may be a wonderful filesystem, it is not the best solution for every possible (odd) setup. I.e USB-sticks has proven a bad idea with zfs mirrors, ergo - dont do it(tm). ZFS on iSCSI *is* flaky and a host-reboot without telling the target will most likely get the target to panic and/or crash in a couple of fun ways. Yes I have tested, way to many times, and yes I know this - therefore I do not reboot my hosts if not absolutely necessary and if I have to, I usually tell my target(s) to unmount/export/calm down before I do it. The same is true for FC-connected vdevs, and thus - I dont reboot my SAN nor my FC connected cabinets without telling the targets. I know that sometimes you cant avoid theese situations but then I've learned how to fix it when it happends, and there's no bs, just fix it. I've been equally disappointed and encouraged by ZFS itself, and i've lost data, i've recovered data (thanks to Victor Latushkin) and im ok with it. Why? Know thy filesystem! I know what I can do, what I can trust and what I cant, so I do and I dont accordingly. Flaming people on ./ is *not* the way to silence those who lost data due to corruptions, telling them that Ok, that's sad, but perhaps ZFS is NOT for you if you want to run USB-connected iscsi-initiator over zfs raidz with mushrooms and sauce. would probably be a better idea. Nor is blaming *all* errors/corruptiosn that actually occur on bad hardware (ECC discussion), stupid programs (mostly GNU-hate) or stupid administrators who didnt get the budget to run backups on 100TB+ nodes etc etc. Edit: Of course blaming all that happends on ZFS itself isnt the *right* thing to do either, rather blame the marketing, if Sun or any other pr says ZFS is the one and only filesystem you need for everything, tell them to go to h*ll, since that is never the thruth. And as usual, just because you *can* doesnt mean you *should*. //T Now I'll probably get flamed by those who get religious over theese kind of discussions, but that's ok, I dont get hurt by flames on internet... Arguing on the internet... special olympics yada yada.. ;-) Best Regards, Timh 2009/6/18 Ian Collins masuma@quicksilver.net.nz: On Thu 18/06/09 09:42 , Miles Nordin car...@ivy.net sent: Access to the bug database is controlled. No, the bug databse is open. Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Timh Bergström System Operations Manager Diino AB - www.diino.com :wq -- Timh Bergström System Operations Manager Diino AB - www.diino.com :wq ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problems with l2arc in 2009.06
This is a mysql database server, so if you are wondering about the smallish arc size, it's being artificially limited by set zfs:zfs_arc_max = 0x8000 in /etc/system, so that the majority of ram can be allocated to InnoDb. I was told offline that it's likely because my arc size has been limited to a point that it cannot utilize l2arc correctly. Can anyone tell me the correct ratio of arc to l2arc? Thanks again, Ethan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss