Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
The question that has occurred to me is: I *must* choose one of those support options for how long? I mean if I buy support for a machine for a year and put S11 Express in production on it, then I don't renew the support, am I now violating the license? That's bogus. I could be wrong but I don't think Sun ever did this. As far as I knew when I worked at Sun, I seem to remember that buying a machine gave you a 'right to use' Solaris (even future versions as I understood it) on that machine with out any extra charge. Is there an option to just buy a license outright without paying for support? This is as bad a some application software companies are. license ends app stops running. Actually this is worse since it's not just one app it's the whole OS. At least it doesn't refuse to run or cripple itself like some other OS does. ;) -Kyle Licensing and Support for Oracle Solaris 11 Express 11-Can I get support for Oracle Solaris 11 Express? Yes. Oracle Solaris 11 Express is covered under the Oracle Premier Support for Operating Systems or Oracle Premier Support for Systems support option for Oracle hardware, and Oracle Solaris Premier Subscription for non-Oracle hardware. Customers must choose either of these support options should they wish to deploy Oracle Solaris 11 Express into a production environment. [1] http://www.oracle.com/technetwork/server-storage/solaris11/overview/faqs-oraclesolaris11express-185609.pdf ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Any opinions on the Brocade 825 Dual port 8Gb FC HBA?
Does OpenSolaris/Solaris11 Express have a driver for it already? Anyone used one already? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Growing the swap vol?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I'd like to give my machine a little more swap. I ran: zfs get volsize rpool/swap and saw it was 2G So I ran: zfs set volsize=4G rpool/swap to double it. zfs get shows it took affect, but swap -l doesn't show any change. I ran swap -d to remove the device, and then swap -a to re-add it, and it still shows 2G (about 4 million blocks). How do I make the change take affect? -Kyle -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJM3xiQAAoJEEADRM+bKN5wbEsH/3Kp1l3NqO62Z6g8GteXN5t3 PpV/x9MnLsfCohM8ye8ThsMWkiGyNUWYJ0rp43wNu/6pqz1uBMPO4JCxOmUNaKXp KSkyA3ZPCO3D49quXJac7uS5aRhyXi2RHoKBDpV4DMeq3cjYr3pfwl5EZgICKxSw govRsdpf3VVEHvYx+pJ4p7Tbvz/Ig1dA/R4rgMnTi5NO/S3wTRG65ESBJI/v3rAA RyXeICg1Ni7wBdUM1LOpbHSJ4uIHAPMvZNuSiG6Hh4XGUy3ihMexWUp9qc6+V3lR De+83rsQpXqgR4d5V1YaQk7msuINN2uhwKQOxT6xhClH4ni+9DV+2l2ABETT1MI= =wR3h -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/12/2010 10:03 AM, Edward Ned Harvey wrote: Since combining ZFS storage backend, via nfs or iscsi, with ESXi heads, I?m in love. But for one thing. The interconnect between the head storage. 1G Ether is so cheap, but not as fast as desired. 10G ether is fast enough, but it?s overkill and why is it so bloody expensive? Why is there nothing in between? Is there something in between? I suppose you could try multiple 1G interfaces bonded together - Does the ESXi hypervisor support LACP aggregations? I'm not sure it will help though, given the algorithms that LACP can use to distribute the traffic. -Kyle Is there a better option? I mean ? sata is cheap, and it?s 3g or 6g, but it?s not suitable for this purpose. But the point remains, there isn?t a fundamental limitation that **requires** 10G to be expensive, or **requires** a leap directly from 1G to 10G. I would very much like to find a solution which is a good fit? to attach ZFS storage to vmware. What are people using, as interconnect, to use ZFS storage on ESX(i)? Any suggestions? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJM3VycAAoJEEADRM+bKN5wh9cIAJNFlr99ue2Bd2l/GBFOHY4y IJ7Z0N6oWtKsHmNoCfepbLa9NU1VdHfaICFXq7TXBJnzjMECUu6gfsW/dK+3tgBv 1jcpx5+pxk4yAYA0znBUn+ro57bZH6PDV/tZzy4ZU0M/uLQtHGpD2wZF+qj3b9MC ieG6ywkt9YiOzOvOk7X7oTwi+iQQeKRXKVi+02vxeuN8PWRkD2NtHGbfLlp3f3en LNZx0hD0gOXBMSW3xRKTAJv0ioNRptRI0ZVc1a5+0daksioOlhdeMl+2tV2zCb8h qmnrj+H1RlWORPAWPo9QsQPLBBGixkcy7Yavj+XZz9nHanHAbtUt5z5j/hKsvAM= =dDzv -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Any opinoins on these SSD's?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'm shopping for an SSD for a ZIL. Looking around on NewEgg, at the claimed (not sure I beleive them) IOPS, these caught my attention: Corsair Force 80GB CSSD-F80GBP2-BRKT50K 4K aligned ran. write IOPS OCZ Vertex 2 120GB OCZSSD3-2VTX120G 50K 4K aligned ran. write IOPS A-DATA S559 128GB AS599S0128GM 50K 4k aligned write IOPS Crucial RealSSD C300 128GB CTFDDAC128MAG-1G1CCA 60K/30K 4k read/write IOPS Any opinions? stories? other models I missed? Other questions: 1) The ZIL will be small compared to the size of these, can I use the rest as L2ARC or is that not such a good idea? 2) Will ZFS align the ZIL writes in such a way that those IOPS numbers will be close to attainable? -Kyle -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJM3E1IAAoJEEADRM+bKN5wYdkH/28o/PxjEYvuGTyzgXof2apu 79NL1uceWP8mlqV8Fo55XTLyISEOh/b+72YSNdFx0lvzNkV+SvI19cugH1IS4Ic7 zspUYgEBs1Xq9+fUarRFO1vOaFdZcSByyaAGN4XHGSz30E6bRfmAQU0l7VVqR3Cc UQbHfk588PkhNKT0JcDD1vku06jsdGRNHzqAH5QdrQnaNfPXBHFOvdDIbClFnohE G213EE5wwzWDSUHYP8rD29dL0atOFFev4203D5LeatxoXME9qAprZEdaG41gTN1A 0XnaUW2RLDzgzkXAg6b1V2ufLmmdpp/jQjkE5QIesar7t/ZPTQPiowrIhh/UwBg= =fFbi -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 8/7/2010 4:11 PM, Terry Hull wrote: It is just that lots of the PERC controllers do not do JBOD very well. I've done it several times making a RAID 0 for each drive. Unfortunately, that means the server has lots of RAID hardware that is not utilized very well. Doing that lets you use the cache, which is the only part of the RAID HW that I'd worry about wasting. Also, ZFS loves to see lots of spindles, and Dell boxes tend not to have lots of drive bays in comparison to what you can build at a given price point. I've found the R515 (the R510's cousing with AMD processors) to be very interesting in this regard. It has many more drive bays than most Dell boxes. I've also priced out the IBM x3630 M3, even more drive bays in this one. for about %20 more. Of course then you have warranty / service issues to consider. I don't know what you're needs are, but I found dell's 5yr onsite 10x5 NBD support to be priced very attractively. But I can live with a machine being down till the next day, or through a weekend. -Kyle -- Terry Hull Network Resource Group, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMyZFEAAoJEEADRM+bKN5wTr4IAIh1LzIcm346TVcRZdKwkbgW EkFux2ZT8uzk/v1lXqgiDCkO0zQ/Bwpk9SsSa0KOblOxKRWPYQwj2pO30syX/QnR 82aFfhcJaWmf0H3aphoowqTTDhKRefYXgbPINaVafDV8JY8tN9d0+Tcnhv03n3pq 7Eafg+RbjaZPceZxDuNQ0xJFw+cpXvOYSFAcCB+E49actOqDIErf4A2xGL96PK7k POu1bHN5qyIsca6t76nvuR7w8+yq6FfM4HY0KahyPhx/MXjp01N7vFyQKdLF5rGU ByliQedo7r8OsLl6BxeMwv+SBNxab4sjqWpWfTzniLk1Ng6aG3mm5YQ7/iAUZ+0= =FDkN -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Running on Dell hardware?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/25/2010 3:39 AM, Markus Kovero wrote: Any other feasible alternatives for Dell hardware? Wondering, are these issues mostly related to Nehalem-architectural problems, eg. c-states. So is there anything good in switching hw vendor? HP anyone? Note that while it was a Dell I was asking about, it's an AMD opteron system (the R515.) I doubt with an architecture that different that the same 'c-states' corner case will appear. Aren't there too many variables changing between AMD and Intel to have the exact same problem? Not there there won't be a different problem though. :) -Kyle Yours Markus Kovero ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMxYghAAoJEEADRM+bKN5wnEEH/iMYiNEjqRdEWMYMlzrXJV7G 1EqsmgC/10nwdVS+lxHQbeoXZ6AZltomkb42ckwLfR74BVwHTM8BBC2hmoaXVMAr FeJzVPe61c8LF5M0RrVJ59gXpBJCjIps8mBli/7wqNYm5SyLAfu0DDD59kY54n75 QcvNvz6mNlXjmE2+kakcLbN3DMjCxRlQ4XgrGQrqwusoZL7LPFhwEy7f+rGp63PO LW82RUIolVqRoNQ5Vg2iemaASkbJUKONppOV2J6FN30MQt8fyGL8SlkU1Fek/hgS EbHZ1e8wgmrOKlcKxnMMH7yh296X8ICl990aWRbt6jxUDM+zeKRC3NceV+pmrSc= =heKE -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Running on Dell hardware?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi All, I'm currently considering purchasing 1 or 2 Dell R515's. With up to 14 drives, and up to 64GB of RAM, it seems like it's well suited for a low-end ZFS server. I know this box is new, but I wonder if anyone out there has any experience with it? How about the H700 SAS controller? Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I want to put some SSD's in a box like this, but there's no way I'm going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are they kidding? -Kyle -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMwiMEAAoJEEADRM+bKN5w5IkH/AjOBKmnEUHIsSbW44Tmo94o 83kISEBx/hRYhLzNEpFYOW6IBD3pqYDGQP7da4ULMdPBINCWE6zcUT83BTct6O0D MSHJXacciOILIMMj6SM6+auvv9WloWwrbV/S+KsvkKoLxzhBafYkxZOEMJlkBwp1 Jpm/P3EoWpNLBasSHCCvKsGskZUDpIgVnzKrMkzXV6R5ROlgYlmFNPGlC/1kbL1Y 9DZrlKow0Ai0W5fCXjGSafZbzawa4SpBj02ES7CUQLvn45EhaRrSkneAM4dy1obo Oif4c1Nt2c0yV5xa1tc4i84Vd2iy9LR6g5C+1Hm3UqAKjcwPEEEUyAYhQpsKAIA= =DW76 -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/18/2010 4:28 AM, Habony, Zsolt wrote: I worry about head thrashing. Why? If your SAN group gives you a LUN that is at the opposite end of the array, I would think that was because they had already assigned the space in the middle to other customers (other groups like yours, or other hosts of yours.) If so, don't you think that all those other hosts and customers will be reading and writing from that array all the time anyway? I mean if the heads are going to 'thrash', then they'll be doing so even before you request your second LUN right? Adding your second LUN to the mix isn't going to seriously change the workload on the disks in the array. Though memory cache of large storage should make the problem easier, I would be more happy if I can be sure that zpool will not be handled as a stripe. Is there a way to avoid it, or can we be sure that the problem does not exist at all ? As I think the logic above suggests, If the problem exists, it exists even when you only have 1 LUN. -Kyle -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMvFeKAAoJEEADRM+bKN5wuc4IALPTIrGcAq6TWa95yrA/DCWp vu2K7+pwSvz/IRIP+C6Y+qvWm/Km+UdtRu6PKb8G/DF8xp5vEnkqXdRSNDC6FlpR EwSNavS7ij87bN6fuBiw6E02GZtADi2RptPKgyGz1FT3wPDHS8SQKtA59DwrWJNS ckHUi+9BwngL4p7E0C+8pcahyF7QmtTm3DpL3y4AZ+7O+c/wPcIwLZ3dI6yQU8vd KuRe6h/xCHffKH9gHoXJf0pG4e5iA8XP+lt7DlJGPxRYzZil0Rr5JA67uGqEf/VY FbhAtXqWrHkNSd2sk1bIJVj7OFCS6j/NXMkV/Dt6OUH2Gkucl1nBs4yIAQ9Hu3s= =I+w1 -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/18/2010 5:40 AM, Habony, Zsolt wrote: (I do not mirror, as the storage gives redundancy behind LUNs.) By not enabling redundancy (Mirror or RAIDZ[123]) at the ZFS level, you are opening yourself to corruption problems that the underlying SAN storage can't protect you from. the SAN array won't even notice the problem. ZFS will notice the problem, and (if you don't give it redundancy to work with) it won't be able to repair it for you. You'd be better off getting unprotected LUNS from the Array, and letting ZFS handle the redundancy. -Kyle Online LUN expansion seems promising, and answering my question. Thank You for that. Zsolt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMvFhtAAoJEEADRM+bKN5wmgwIAK2HCAtaHkAp2RxqfkcFGD3A 0YyzP148fzTcEpFwhpNm59nht9fsfAibjCZZ/HmApe2jYWJ2K9l4W0MBXedXnz3e gEaIxqymSHLjkF2SF0OD2XfnNiDMor5CrzPirZMcAL7TeyIqyACeuQTVVqZPw2rZ TF1fGG2M9Y0l1Gq5+PfNcGESiz4tb7Er6UtDnLFe7rx4DObNJnO07jr1BMBxHsp8 tL1+YxhAUpWvaKOqHJvruZRtxagdE1KUQAtipPQjZvFudqIVAT8PRL0Acwz0D6aq Lv1nmYzGg3M1usjrbfSEDV2eM3WR3gc7px93xyxZ1kMQPOgRO7X0YRxwfUMEsUc= =+YXG -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RaidzN blocksize ... or blocksize in general ... and resilver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/17/2010 9:38 AM, Edward Ned Harvey wrote: The default blocksize is 128K. If you are using mirrors, then each block on disk will be 128K whenever possible. But if you're using raidzN with a capacity of M disks (M disks useful capacity + N disks redundancy) then the block size on each individual disk will be 128K / M. Right? If I understand things correctly, I think this is why it is recommended that you pick an M that divides into 128K evenly. I believe powers of 2 are recommended. I think increasing the block size to 128K*M would be overkill, but that idea does make me wonder: In cases where M can't be a power of 2, would it make sense to adjust the block size so that M still divides evenly? If M were 4 then the data written to each drive would be 32K. So if you really wanted to M to be 5 drives, is there an advantage to making the block size 160K, or if that's too big, how about 80K? Like wise if you really wanted to M to be 3 drives, would adjusting it BS to 96K make sense? -Kyle -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMuzG2AAoJEEADRM+bKN5wokMH/A2W3hjf2yZx0uO4n0UvSbIY aAS2faGjx9R03ile3u1K/Qlg/dAm0zLdMkNoKY8Pcg8TPx3VLCapNvmlySxCldAf rPXC8NC5xzIj75oGqb1VGByUlqerCdVldvBjo5vFKcDM83CcpLLjmO6gJzNe1UoV MwcKsb0oZv3JzmYcvqjW/lNCIjaQzxkm0k0EP+pV1tx+HMPyHp+kaxnzv4v994GO zwz0OfUOsHaIkSJda8t8ekg9qMdvZa63X8A0VGmhnR26lpjHZD/274IPBStapasx IC+T7O0EYazQSO3fftZ6MCd9O6//0tbQX0MLHPDMpyX90EU+ihILuqYn/QjJjhg= =4mvO -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS, IPS (IBM ServeRAID) driver, and a kernel panic...
Hi, I have been trying out the latest NextentaCore and NexentaStor Community ed. builds (they have the driver I need built in) on the hardware I have with this controller. The only difference between the 2 machines is that the 'Core' machine has 16GB of RAM and the 'Stor' one has 12GB. On both machines I did the following: 1) Created zpool consisting of a single RaidZ from 5 300GB U320 10K drives. 2) Created 4 filesystems in the pool. 3) On the 4 filesystems I set the dedup and compression properties to cover all the combinations. (off/off, off/on, on/off, and on/on) On the 'Stor' machine I elected to Disable the ZIL and cacheflush through the web GUI. I didn't do this on the 'Core' machine. On the 'Core' machine I mounted the 4 Filesystems from the 'Stor' machine via NFSv4. Now for a bit of history. I tried out the 'Stor' machine in this exact config (but with ZIL and Cache flushes on) about a month ago with version 3.0.2. At that time I used a Linux NFS client to time untar'ing the GCC sources to each of the 4 filesystems. This test repeatedly failed on the first filesystem by bringing the machine to it's knees to the point that I had to power cycle it. This time around I decided to use the 'Core' machine as the client so I could also time the same test to it's local ZFS filesystems. At first I got my hopes up, because the test ran to completion (and rather quickly) locally on the core machine. I then added running it over NFS to the 'Stor' machine to the testing. In the beginning I was untarring it once on each filesystem, and even over NFS this worked (though slower than I'd hoped for having the ZIL and cacheflush disabled.) So I thought I'd push the DeDup a little harder, and I expanded the test to untar the sources 4 times per filesystem. This ran fine until the 4th NFS filesystem, where the 'Stor' machine panic'd. The client waited while it rebooted, and then resumed the test causing it to panic a second time. For some reason it hung so bad the second time it didn't even reboot - I'll have to power cycle it monday when I get to work. The 2 stack traces are identical: anic[cpu3]/thread=ff001782fc60: BAD TRAP: type=e (#pf Page fault) rp=ff001782f9c0 addr=18 occurred in module unix due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x18 pid=0, pc=0xfb863374, sp=0xff001782fab8, eflags=0x10286 cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de cr2: 18cr3: 500cr8: c rdi: ff03dc84fcfc rsi: ff03e1d03d98 rdx:2 rcx:2 r8:0 r9: ff0017a51c60 rax: ff001782fc60 rbx:2 rbp: ff001782fb10 r10: e10377c748 r11: ff00 r12: ff03dc84fcfc r13: ff00 r14: ff00 r15: 10 fsb:0 gsb: ff03e1d03ac0 ds: 4b es: 4b fs:0 gs: 1c3 trp:e err:0 rip: fb863374 cs: 30 rfl:10286 rsp: ff001782fab8 ss: 38 ff001782f8a0 unix:die+dd () ff001782f9b0 unix:trap+177b () ff001782f9c0 unix:cmntrap+e6 () ff001782fb10 unix:mutex_owner_running+14 () ff001782fb40 ips:ips_remove_busy_command+27 () ff001782fb80 ips:ips_finish_io_request+a8 () ff001782fbb0 ips:ips_intr+7b () ff001782fc00 unix:av_dispatch_autovect+7c () ff001782fc40 unix:dispatch_hardint+33 () ff0018517580 unix:switch_sp_and_call+13 () ff00185175d0 unix:do_interrupt+b8 () ff00185175e0 unix:_interrupt+b8 () ff00185176e0 genunix:kmem_free+34 () ff0018517710 zfs:zio_pop_transforms+86 () ff0018517780 zfs:zio_done+152 () ff00185177b0 zfs:zio_execute+8d () ff0018517810 zfs:zio_notify_parent+a6 () ff0018517880 zfs:zio_done+3e2 () ff00185178b0 zfs:zio_execute+8d () ff0018517910 zfs:zio_notify_parent+a6 () ff0018517980 zfs:zio_done+3e2 () ff00185179b0 zfs:zio_execute+8d () ff0018517a10 zfs:zio_notify_parent+a6 () ff0018517a80 zfs:zio_done+3e2 () ff0018517ab0 zfs:zio_execute+8d () ff0018517b50 genunix:taskq_thread+248 () ff0018517b60 unix:thread_start+8 () syncing file systems... done dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel + curproc 0% done: 0 pages dumped, dump failed: error 5 rebooting... As I read this, it's probably a bug in the IPS driver. But I really don't know anything about kernel panic's. This seems 100% reproducible, so I'm happy to run more tests in KDB if it will help. As I've mentioned before I'd be happy to try to work on the code myself if it were available. Anyone have any ideas? -Kyle On 7/7/2010 3:12 PM, Kyle McDonald wrote: On 6/24/2010 6:31 PM, James C. McPherson wrote: hi Kyle
Re: [zfs-discuss] Announce: zfsdump
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/28/2010 10:30 PM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Tristram Scott If you would like to try it out, download the package from: http://www.quantmodels.co.uk/zfsdump/ I haven't tried this yet, but thank you very much! Other people have pointed out bacula is able to handle multiple tapes, and individual file restores. However, the disadvantage of bacula/tar/cpio/rsync etc is that they all have to walk the entire filesystem searching for things that have changed. A compromise here might be to feed those tools the output from the new ZFS diff command (which 'diffs' 2 snapshots.) when it arrives. That might get somethign close to the best of both worlds. -Kyle The advantage of zfs send (assuming incremental backups) is that it already knows what's changed, and it can generate a continuous datastream almost instantly. Something like 1-2 orders of magnitude faster per incremental backup. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMKgO2AAoJEEADRM+bKN5wqF0IAJMN1+41+WSEy8qR4QrxFkPc VgHv976VjY/mf2EujeSLQOwHEzx4bEfAnA7DjehQqim0YXSvo5jIDXwEZYkoCBaU TsD6RQucks23fJUhsf0XKZNXZkpe7dqxGFXbOVd8so12LoYaB4/ZfZMdaQrhOHX8 CwyjS22YCvgxYTEUXs52RSwBg8Qw/sxjMYNa2D/iJPgZ8qtezNiiJD3bb8b30TRy 0YFHnAaC6V4/iyDvh+NpixPflaLMFmCkSh55zK1rBVHNJ7npUpZEFAKUZOXq/q38 bttGomj5gJSaoI8u8NGqADuh4Bk7JbkqKncXGJ6gxwW0pyIEplI3tS6yCTHgP/w= =Hhu9 -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSDs adequate ZIL devices?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've very in-frequently seen the RAMSAN devices mentioned here. Probably due to price. However a long time ago I think I remember someone suggesting a build it yourself RAMSAN. Where is the down side of one or 2 OS boxes with a whole lot of RAM (and/or SSD's) exporting either RAMdisks or zVOLs out over iSCSI, FCoE, or direct FC (can OS do that?) If the RAM and/or SSD's (or even HD's) ere large enough this box might be able to serve several other ZFS servers. A dedicated Network, or direct connections if there are enough ports, should eliminate the net from the being a bottle neck. A sub $100 UPS (or 2) could protect the whole thing. I'm sure I'm missing something, but I'm not seeing it at the moment. Anyone else have any ideas? -Kyle -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJMGN7DAAoJEEADRM+bKN5w35EIAKX5T96Ls4wNQUMEtHKp1qpM cu3TlS+h+2vRGMYq0ZMnudiEvGlvxOldifSUHkHWWVMqOsPZplMcBJMoDXOQgChU i4NPSMTnjPT3zRxLeOm6ZCrfHv4/rYr4RNYjN2DUcaXHrfGdMXg0aYFAoJxObnwx zMNB8xLqqlXDIkSo3i9ONZAbvVbHehs8V3az63j/P+AyyQcyhu96xR3wjJZpfDnI N7kE3id9o8WNufw35KyQy3w/bOAvhh8dXsuZm81rpaq6VQ1wS5AnRVQ48mhbYua9 kZNy8eLrobOBR2YCZZFoLrXVQWYfSVMV/pL0fYUf2J12P7EETk6LHKnr3Hy7W2E= =XDQw -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Native ZFS for Linux
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/11/2010 12:32 AM, Erik Trimble wrote: On 6/10/2010 9:04 PM, Rodrigo E. De León Plicet wrote: On Tue, Jun 8, 2010 at 7:14 PM, Anurag Agarwalanu...@kqinfotech.com wrote: We at KQInfotech, initially started on an independent port of ZFS to linux. When we posted our progress about port last year, then we came to know about the work on LLNL port. Since then we started working on to re-base our changing on top Brian's changes. We are working on porting ZPL on that code. Our current status is that mount/unmount is working. Most of the directory operations and read/write is also working. There is still lot more development work and testing that needs to be going in this. But we are committed to make this happen so please stay tuned. Good times ahead! I don't mean to be a PITA, but I'm assuming that someone lawyerly has had the appropriate discussions with the porting team about how linking against the GPL'd Linux kernel means your kernel module has to be GPL-compatible. It doesn't matter if you distribute it outside the general kernel source tarball, what matters is that you're linking against a GPL program, and the old GPL v2 doesn't allow for a non-GPL-compatibly-licensed module to do that. As a workaround, take a look at what nVidia did for their X driver - it uses a GPL'd kernel module as a shim, which their codebase can then call from userland. Which is essentially what the ZFS FUSE folks have been reduced to doing. If the new work is a whole new implementation of the ZFS *design* intended for the linux kernel, then Yea! Great! (fortunately, it does sound like this is what's going on) Otherwise, OpenSolaris CDDL'd code can't go into a Linux kernel, module or otherwise. Actually my understanding of this is that it revolves around distribution (copying - since it's based on copyright) of the code. If the developers distribute source code, which is then compiled and linked to the GPL code by the *end-user* then there are no issues, since the person combining the 2 codebases is not distributing the combined work further. The grey-er area (though it can still be ok if I understand correctly) is when the code is distributed pre-compiled. On one hand presumably GPL headers were used to do the compiling, but on the other it is still the *end-user* that links the 2 'programs' together and that's what really matters. I beleive this is how all the proprietary binary drivers for linux get around this issue. All the licenses do is hamper distribution. The vendors using shims may do so to make it easier to be included in major linux distributions? -Kyle -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJMEi8JAAoJEEADRM+bKN5w/z0IAMMPo0tcCY2jFb0pJ5Ee6M1j HJFdpTlg5eMsyIJ/4+lj/G1haMnn2YTD5UT4LWkg5x7LSwqCtNA+lRgcTc5zoYQ3 SievVfCaJ4lal3xB2AoKLzhNd4BxDG4bLBI8S1q8LEyx+J2bhbleWpkATwegJ9N/ xA0yecoQAqxwOv3gOTr7DKbCyo/t4VxXkgKxKHauztYy5JMg/UqhRwQrKnfL4E7H 4qZpqapi81+G77d16cEpCcZvG1lgEYfMas4+5Eju5x1BteXsWs87cWZhVJLN0Pkl p+CPHSgt0CtP+Wg07ojvHRGbnm32uaLEEmN1ieb08YqEEFsLXE6l5qgEg9fv3cU= =PByp -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] General help with understanding ZFS performance bottlenecks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/9/2010 5:04 PM, Edward Ned Harvey wrote: Everything is faster with more ram. There is no limit, unless the total used disk in your system is smaller than the available ram in your system ... which seems very improbable. Off topic, but... When I managed a build/simulation farm for one of Sun's ASIC design teams, we had several 24 CPU machines with 96GB or 192GB of RAM and only 36GB or maybe 73GB of disk. Probably a special case though. ;) -Kyle -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJMEANnAAoJEEADRM+bKN5w+8EH/iUP/eEZUkZLLCyqgKN89yfy TBePmfHwBgneIvcW+YJrk1aKysXAze/PNxP4tBtUsgoqrbmPQTFqFkAcIrLxw1Sf udmSD+LQsOAult2W5e/jpJIxbPQRnbWqUuyatimN0xRF6Fs9/D5fFX8LDvjl5Eqb daf+e2fRGFn0rvQ2g+TQpulR6PwQTdkmh+e7oYkQ7kV6DvKjjbPVApRKrurNVMR5 SQbArcm6xwCmq5x+Yn2bXERlM8IPA9Z4APxScY6P7yxc3yqFbKyosEU98fP1JJtR GWflGBRc+uysozCu6Dc2WSek/loIRnihzDTDtdcZynLXsN7if139LaCGYFRx1j4= =ylMM -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs/lofi/share panic
On 5/27/2010 2:45 PM, Jan Kryl wrote: Hi Frank, On 24/05/10 16:52 -0400, Frank Middleton wrote: Many many moons ago, I submitted a CR into bugs about a highly reproducible panic that occurs if you try to re-share a lofi mounted image. That CR has AFAIK long since disappeared - I even forget what it was called. This server is used for doing network installs. Let's say you have a 64 bit iso lofi-mounted and shared. You do the install, and then wish to switch to a 32 bit iso. You unshare, umount, delete the loopback, and then lofiadm the new iso, mount it and then share it. Panic, every time. Is this such a rare use-case that no one is interested? I have the backtrace and cores if anyone wants them, although such were submitted with the original CR. This is pretty frustrating since you start to run out of ideas for mountpoint names after a while unless you forget and get the panic. FWIW (even on a freshly booted system after a panic) # lofiadm zyzzy.iso /dev/lofi/1 # mount -F hsfs /dev/lofi/1 /mnt mount: /dev/lofi/1 is already mounted or /mnt is busy # mount -O -F hsfs /dev/lofi/1 /mnt # share /mnt # If you unshare /mnt and then do this again, it will panic. This has been a bug since before Open Solaris came out. It doesn't happen if the iso is originally on UFS, but UFS really isn't an option any more. FWIW the dataset containing the isos has the sharenfs attribute set, although it doesn;t have to be actually mounted by any remote NFS for this panic to occur. Suggestions for a workaround most welcome! the bug (6798273) has been closed as incomplete with following note: I cannot reproduce any issue with the given testcase on b137. So you should test this with b137 or newer build. There have been some extensive changes going to treeclimb_* functions, so the bug is probably fixed or will be in near future. Let us know if you can still reproduce the panic on recent build. I don't know if the code path is the same enough, bu you should also try it like this: # mount -F hsfs zyzzy.iso /mnt For many builds now, (Open)Solaris hasn't needed the 'lofiadm' step for ISO's (and possibly other FS's that can be guessed) I now put ISO's (for installs just like you) directly in my /etc/vfstab. -Kyle thanks -jan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] nfs share of nested zfs directories?
On 5/27/2010 9:30 PM, Reshekel Shedwitz wrote: Some tips… (1) Do a zfs mount -a and a zfs share -a. Just in case something didn't get shared out correctly (though that's supposed to automatically happen, I think) (2) The Solaris automounter (i.e. in a NIS environment) does not seem to automatically mount descendent filesystems (i.e. if the NIS automounter has a map for /public pointing to myserver:/mnt/zfs/public but on myserver, I create a descendent filesystem in /mnt/zfs/public/folder1, browsing to /public/folder1 on another computer will just show an empty directory all the time). The automounter behaves the same irregardless of whether NIS is invovled or not (or LDAP for that matter.) The Automounter can be configured with files locally, and that won't change it's behavior. The behavior your describing has been the behavior of all flavors of NFS since it was born, and also doesn't have anything to do with the automounter - it was by design. No automounter I'm aware of is capable of learning on it's own that 'folder1' is a new filesystem (not a new directory) and mounting it. So this isn't limited to Solaris. If you're in that sort of environment, you need to add another map on NIS. Your example doesn't specify if /public is a direct or indirect mount, being in / kind of implies it's direct, and those mounts can be more limiting (more so in the past) and most admins avoid using the auto.direct map for these reasons. If the example was /import/public with /import being defined by the auto.import map, then the solution to this problem is not an entirely new entry in the the map for /import/public/folder1, but to convert the entry for /import/folder1 to a hierarchical mount entry, specifying explicitly the folder1 sub mount. A hierarchical mount can even mount folder1 from a different server than public came from. In the past (SunOS4 and early Solaris timeframe) heirarchical mounts had some limitations (mainly issues with unmounting them) that made people wary of them. Most if not all of those have been eliminated. In general the Solaris automounter is very reliable and flexible and can be configured to do almost anything you want. Recent linux automounters (autofs4??) have come very close to the Solaris ones, however earlier ones had some missing fieatures, buggy features, and some different interpretations of the maps. But the issues described in this thread is not an automounter issue, it's a design issue of NFS - at least for all versions of NFS before v4. Version 4 has a feature that others have mentioned called mirror mounts that tries to pass along the information trequired for the client to re-create the sub-mount - Even if the original fileserver mounted the sub-filesystem from another server! It's a cool feature, but NFS v4 suport in clients isn't complete yet, so specifying the full hierarchical mount tree in the automount maps is still required. (3) Try using /net mounts. If you're not aware of how this works, you can browse to /net/computer name to see all the NFS mounts. On Solaris, /net *will* automatically mount descendent filesystems (unlike NIS). In general /net mounts are a bad idea. While it will basically scan the output of 'showmount -e' for everything the server exports, and mount it all, that's not exactly what you always want. It will only pick up sub-filesystem that are explicitly shared (which NFSv4 might also only do I'm not sure) and it will miss branches of the tree if they are mounted from another server. Also most automounters that I'm aware of will only mount all the exported filesystems at the time of the access to /net/hostname, and (unles it's unused long enough to be unmounted) will miss all changes in what is exported on the server until the mount is triggered again. On top of that, /net/hostname mounts encourage embedding the hostname of the server in config files, scripts, and binaries (-R path for shared libraries) and that's not good since you then can't move a filesystem from one host to another, since you need to maintain that /net/hostname path forever - or edit many files and recompile programs. (If I recall correctly, this was once used as one of the arguments against shared libraries by some.) Because of this, by using /net/hostname, you give up one of the biggest benefits of the automounter - redirection. By making an auto.import map that has an entry for 'public' you allow yourself to be able to clone public to a new server, and modify the map to (over time as it is unmounted and remounted) migrate the clients to the new server. Lastly using /net also diables the load-sharing and failover abilities of read-only automounts, since you are by definition limiting yourself to one hostname. That was longer than I expected, but hopefully it will help some. :) -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org
[zfs-discuss] USB Flashdrive as SLOG?
Hi, I know the general discussion is about flash SSD's connected through SATA/SAS or possibly PCI-E these days. So excuse me if I'm askign something that makes no sense... I have a server that can hold 6 U320 SCSI disks. Right now I put in 5 300GB for a data pool, and 1 18GB for the root pool. I've been thinking lately that I'm not sure I like the root pool being unprotected, but I can't afford to give up another drive bay. So recently the idea occurred to me to go the other way. If I were to get 2 USB Flash Thunb drives say 16 or 32 GB each, not only would i be able to mirror the root pool, but I'd also be able to put a 6th 300GB drive into the data pool. That led me to wonder whether partitioning out 8 or 12 GB on a 32GB thumb drive would be beneficial as an slog?? I bet the USB bus won't be as good as SATA or SAS, but will it be better than the internal ZIL on the U320 drives? This seems like at least a win-win, and possibly a win-win-win. Is there some other reason I'm insane to consider this? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] USB Flashdrive as SLOG?
On 5/25/2010 11:39 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Kyle McDonald I've been thinking lately that I'm not sure I like the root pool being unprotected, but I can't afford to give up another drive bay. I'm guessing you won't be able to use the USB thumbs as a boot device. But that's just a guess. No I've installed to an 8GB one on my laptop and booted from it. And this server offers USB drives as a boot option, I don't see why it wouldn't work. but I won't kow till I try it. However, I see nothing wrong with mirroring your primary boot device to the USB. At least in this case, if the OS drive fails, your system doesn't crash. You're able to swap the OS drive and restore your OS mirror. True. If nothing else I may do at least that. That led me to wonder whether partitioning out 8 or 12 GB on a 32GB thumb drive would be beneficial as an slog?? I think the only way to find out is to measure it. I do have an educated guess though. I don't think, even the fastest USB flash drives are able to work quickly, with significantly low latency. Based on measurements I made years ago, so again I emphasize, only way to find out is to test it. Yes I guess Ill have to try some benchmarks. The thing that got me thinking was that many of these drives support a windows feature called 'Ready boost' - which I think is just windows swapping to the USB drive instead of HD - but Windows does a performance test on the device to seee it's fast enough. I thought maybe if it's faster to swap to than a HD it might be faster for an SLOG too. But you're right the only way to know is to measure it. One thing you could check, which does get you a lot of mileage for free is: Make sure your HBA has a BBU, and enable the WriteBack. In my measurements, this gains about 75% of the benefit that log devices would give you. My HBA's have 256MB of BBC. And it's enabled on all 6 drives, so that should help. However I may have hit a bug inthe 'isp' driver (still have to debug and see if that's the root cause) and I may need to yank the RAID enabler, and go back to straight SCSI. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
SNIP a whole lot of ZIL/SLOG discussion Hi guys. yep I know about the ZIL, and SSD Slogs. While setting Nextenta up it offered to disable the ZIL entirely. For now I left it on. In the end (hopefully for only specifc filesystems - once that feature is released.) I'll end up disabling the ZIL for our software builds since: 1) The builds are disposable - We only need to save them if they finish, and we can restart them if needed. 2) The build servers are not on UPS so a power failure is likely to make the clients lose all state and need to restart anyway. But, This issue I've seen with Nexenta, is not due to the ZIL. It runs until it literally crashes the machine. It's not just slow, It brings the machine to it's knees. I beleive it does have something to do with exhausting memory though. As Erast says it maybe the IPS driver (though I've used that on b130 of SXCE without issues,) or who knows what else. I did download some updates from Nexenta yesterday. I'm going to try to retest today or tomorrow. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
Hi all, I recently installed Nexenta Community 3.0.2 on one of my servers: IBM eSeries X346 2.8Ghz Xeon 12GB DDR2 RAM 1 builtin BGE interface for management 4 port Intel GigE card aggregated for Data IBM ServRAID 7k with 256MB BB Cache with (isp driver) 6 RAID0 single drive LUNS (so I can use the Cache) 1 18GB LUN for the rpool 5 300GB LUN for the data pool 1 RAIDZ1 pool from the 5 300GB drives. 4 test filesystems 1 No Dedup, No Compression 1 DeDup, No Compression 1 No DeDup, Compression 1 DeDup, Compression This is pretty old hardware, so I wasn't expecting miracles, but I thought I'd give it a shot. My work load is NFS service to software build servers (cvs checkouts, un tarring files, compiling, etc.) I'm hoping the many CVS checkout trees will lend themselves to DeDup well, and I know source code should compress easily. I setup one client with a single GigE connection, mounted the four file systems (plus one from the netapp we have here) and proceeded to write a loop to time both un-tarring the gcc-4.3.3 sources to those 5 filesystems, and to 1 local directory, and to rm -rf the sources too. The tar took 28 seconds and 10 seconds to remove in the local dir, then on the first ZFS/NFS filesystem mount, it took basically forever and hung the Nexenta server. I was watching it go on the web admin page and it all looked fine for a while, then the client started reporting 'NFS Server not responding, still trying...' For a while, there were Also 'NFS Server OK' messages too, and the Web GUI remained responsive. Eventually The OK messages stopped, and the Web GUI froze. I went an rebooted the NFS client thinking that id the requests stopped the Server might catch up, but it never started responding again. I was only untarring a file.. How did this bring the machine down? I hadn't even gotten to the FS's that had SeSup or Compression turned on, so those shouldn't have affected things - yet. Any ideas? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
On 3/2/2010 10:15 AM, Kjetil Torgrim Homme wrote: valrh...@gmail.com valrh...@gmail.com writes: I have been using DVDs for small backups here and there for a decade now, and have a huge pile of several hundred. They have a lot of overlapping content, so I was thinking of feeding the entire stack into some sort of DVD autoloader, which would just read each disk, and write its contents to a ZFS filesystem with dedup enabled. [...] That would allow me to consolidate a few hundred CDs and DVDs onto probably a terabyte or so, which could then be kept conveniently on a hard drive and archived to tape. it would be inconvenient to make a dedup copy on harddisk or tape, you could only do it as a ZFS filesystem or ZFS send stream. it's better to use a generic tool like hardlink(1), and just delete files afterwards with There is a perl script floating around on the internet for years that will convert copies of files on the same FS to hardlinks (sorry I don't have the name handy). So you don't need ZFS. Once this is done you can even recreate an ISO and burn it back to DVD (possibly merging hundreds of CD's into one DVD (or BD!). The script can also delete the duplicates, but there isn't much control over which one it keeps - for backupsyou may realyl want to keep the earliest (or latest?) backup the file appeared in. Using ZFS Dedup is an interesting way of doing this. However archiving the result may be hard. If you use different datasets (FS's) for each backup, can you only send 1 dataset at a time (since you can only snapshot on a dataset level? Won't that 'undo' the deduping? If you instead put all the backups on on data set, then the snapshot can theoretically contain the dedpued data. I'm not clear on whether 'send'ing it will preserve the deduping or not - or if it's up to the receiving dataset to recognize matching blocks? If the dedup is in the stream, then you may be able to write the stream to a DVD or BD. Still if you save enough space so that you can add the required level of redundancy, you could just leave it on disk and chuck the DVD's. Not sure I'd do that, but it might let me put the media in the basement, instead of the closet, or on the desk next to me. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs directory symlink owner
On 5/3/2010 7:41 AM, Michelle Knight wrote: The long ls command worked, as in it created the links, but they didn't work properly under the ZFS SMB share. I'm guessing you meant the 'long ln' command? If you look at what those 2 commadns create you'll notice (in the output of ls -l) that the target the link points to has been recorded in the link differently. One will be relative (../a/foo) and the other absolute (/mirror/audio-Cd-Tracks/a/foo). This can affect how the SMB server process these links when requests for them are made depending on how the parent directories are shared (or not shared.) The relative links should work I would think since they don't 'leave' the SMB share. They didn't work as in, on a remote Linux box, I could execute ls and see them, but I couldn't change in; permission issues. (despite having the correct ownership) and also on the remote linux box, the GUI file browser couldn't even see the folders. Are you also sharing these files to Windows machines? If you're only sharing them to Linux machines, then NFS would be so much easier to use. You'll still want relative links though. -Kyle By changing in to the directory and then executing the ls command relative to that point, everything worked. Odd. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs directory symlink owner
On 5/3/2010 4:56 PM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Kyle McDonald If you're only sharing them to Linux machines, then NFS would be so much easier to use. You'll still want relative links though. Only if you have infrastructure to sanitize the UID's. If you have disjoint standalone machines, then samba winbind works pretty well to map usernames to locally generated unique UID's. In which case, IMHO, samba is easier than NFS. However, if you do have some kind of domains LDAP, NIS, etc... then I agree 1,000% NFS is easier than samba. True, using local passwd files on more than a handful of machines can make adding and removing users and changing passwords a pain. But (and I could be wrong these days) in my experience, while the Samba server is great, the SMB client on linux can only mount the share as a single specific user, and all accesses to files in the share are performed as that user. Right? That to me makes SMB a less desirable filesystem then NFS where you can't really tell the difference between that and UFS or whatever. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)
On 3/9/2010 1:55 PM, Matt Cowger wrote: That's a very good point - in this particular case, there is no option to change the blocksize for the application. I have no way of guessing the effects it would have, but is there a reason that the filesystem blocks can't be a multiple of the application block size? I mean 4 4kb app blocks to 1 16kb fs block sounds like it might be a decent comprimise to me. Decent enough to make it worth testing anyway. -Kyle On 3/9/10 10:42 AM, Roch Bourbonnais roch.bourbonn...@sun.com wrote: I think This is highlighting that there is extra CPU requirement to manage small blocks in ZFS. The table would probably turn over if you go to 16K zfs records and 16K reads/writes form the application. Next step for you is to figure how much reads/writes IOPS do you expect to take in the real workloads and whether or not the filesystem portion will represent a significant drain of CPU resource. -r Le 8 mars 10 à 17:57, Matt Cowger a écrit : Hi Everyone, It looks like I¹ve got something weird going with zfs performance on a ramdiskS.ZFS is performing not even a 3rd of what UFS is doing. Short version: Create 80+ GB ramdisk (ramdiskadm), system has 96GB, so we aren¹t swapping Create zpool on it (zpool create ramS.) Change zfs options to turn off checksumming (don¹t want it or need it), atime, compression, 4K block size (this is the applications native blocksize) etc. Run a simple iozone benchmark (seq. write, seq. read, rndm write, rndm read). Same deal for UFS, replacing the ZFS stuff with newfs stuff and mounting the UFS forcedirectio (no point in using a buffer cache memory for something that¹s already in memory) Measure IOPs performance using iozone: iozone -e -i 0 -i 1 -i 2 -n 5120 -O -q 4k -r 4k -s 5g With the ZFS filesystem I get around: ZFS (seq write) 42360 (seq read)31010 (random read)20953 (random write)32525 Not SOO bad, but here¹s UFS: UFS (seq write )42853 (seq read) 100761(random read) 100471 (random write) 101141 For all tests besides the seq write, UFS utterly destroys ZFS. I¹m curious if anyone has any clever ideas on why this huge disparity in performance exists. At the end of the day, my application will run on either filesystem, it just surprises me how much worse ZFS performs in this (admittedly edge case) scenario. --M ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making ZFS better: zfshistory
On 4/17/2010 9:03 AM, Edward Ned Harvey wrote: It would be cool to only list files which are different. Know of any way to do that? cmp Oh, no. Because cmp and diff require reading both files, it could take forever, especially if you have a lot of snapshots to check, with a large file or set of files... Well, what the heck. Might as well make it optional. Sometimes people will just want to check a single small file. I think I saw an ARC case go by recently for anew 'zfs diff' command. I think it allows you compare 2 snapshots, or maybe the live filesystem and a snapshot and see what's changed. It sounds really useful, Hopefully it will integrate soon. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Secure delete?
On 4/16/2010 10:30 AM, Bob Friesenhahn wrote: On Thu, 15 Apr 2010, Eric D. Mudama wrote: The purpose of TRIM is to tell the drive that some # of sectors are no longer important so that it doesn't have to work as hard in its internal garbage collection. The sector size does not typically match the FLASH page size so the SSD still has to do some heavy lifting. It has to keep track of many small holes in the FLASH pages. This seems pretty complicated since all of this information needs to be well-preserved in non-volatile storage. But doesn't the TRIM command help here. If as the OS goes along it makes sectors as unused, then the SSD will have a lighter wight lift to only need to read for example 1 out of 8 (assuming sectors of 512 bytes, and 4K FLASH Pages) before writing a new page with that 1 sector and 7 new ones. Additionally in the background I would think it would be able to find a Page with 3 inuse sectors and another with 5 for example, write all 8 to a new page, remap those sectors to the new location, and then pre-erase the 2 pages just freed up. How doesn't that help? -Kyle Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD sale on newegg
On 4/6/2010 3:41 PM, Erik Trimble wrote: On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the latest recommendations for a log device? http://bit.ly/aL1dne The Vertex LE models should do well as ZIL (though not as well as an X25-E or a Zeus) for all non-enterprise users. The X25-M is still the best choice for a L2ARC device, but the Vertex Turbo or Cosair Nova are good if you're on a budget. If you really want an SSD a boot drive, or just need something for L2ARC, the various Intel X25-V models are cheap, if not a really great performers. I'd recommend one of these if you want an SSD for rpool, or if you need a large L2ARC for dedup (or similar) and can't afford anything in the X25-M price range. You should also be OK with a Corsair Reactor in this performance category. What about if you want to get one that you can use for both the rpool, and ZIL (for another data pool?) What if you want one for all 3 (rpool, ZIL, L2ARC)?? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On 4/4/2010 11:04 PM, Edward Ned Harvey wrote: Actually, It's my experience that Sun (and other vendors) do exactly that for you when you buy their parts - at least for rotating drives, I have no experience with SSD's. The Sun disk label shipped on all the drives is setup to make the drive the standard size for that sun part number. They have to do this since they (for many reasons) have many sources (diff. vendors, even diff. parts from the same vendor) for the actual disks they use for a particular Sun part number. Actually, if there is a fdisk partition and/or disklabel on a drive when it arrives, I'm pretty sure that's irrelevant. Because when I first connect a new drive to the HBA, of course the HBA has to sign and initialize the drive at a lower level than what the OS normally sees. So unless I do some sort of special operation to tell the HBA to preserve/import a foreign disk, the HBA will make the disk blank before the OS sees it anyway. That may be true. Though these days they may be spec'ing the drives to the manufacturer's at an even lower level. So does your HBA have newer firmware now than it did when the first disk was connected? Maybe it's the HBA that is handling the new disks differently now, than it did when the first one was plugged in? Can you down rev the HBA FW? Do you have another HBa that might still have the older Rev you coudltest it on? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
I've seen the Nexenta and EON webpages, but I'm not looking to build my own. Is there anything out there I can just buy? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On 4/2/2010 8:08 AM, Edward Ned Harvey wrote: I know it is way after the fact, but I find it best to coerce each drive down to the whole GB boundary using format (create Solaris partition just up to the boundary). Then if you ever get a drive a little smaller it still should fit. It seems like it should be unnecessary. It seems like extra work. But based on my present experience, I reached the same conclusion. If my new replacement SSD with identical part number and firmware is 0.001 Gb smaller than the original and hence unable to mirror, what's to prevent the same thing from happening to one of my 1TB spindle disk mirrors? Nothing. That's what. Actually, It's my experience that Sun (and other vendors) do exactly that for you when you buy their parts - at least for rotating drives, I have no experience with SSD's. The Sun disk label shipped on all the drives is setup to make the drive the standard size for that sun part number. They have to do this since they (for many reasons) have many sources (diff. vendors, even diff. parts from the same vendor) for the actual disks they use for a particular Sun part number. This isn't new, I beleive IBM, EMC, HP, etc all do it also for the same reasons. I'm a little surprised that the engineers would suddenly stop doing it only on SSD's. But who knows. -Kyle I take it back. Me. I am to prevent it from happening. And the technique to do so is precisely as you've said. First slice every drive to be a little smaller than actual. Then later if I get a replacement device for the mirror, that's slightly smaller than the others, I have no reason to care. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] *SPAM* Re: zfs send/receive - actual performance
On 3/27/2010 3:14 AM, Svein Skogen wrote: On 26.03.2010 23:55, Ian Collins wrote: On 03/27/10 09:39 AM, Richard Elling wrote: On Mar 26, 2010, at 2:34 AM, Bruno Sousa wrote: Hi, The jumbo-frames in my case give me a boost of around 2 mb/s, so it's not that much. That is about right. IIRC, the theoretical max is about 4% improvement, for MTU of 8KB. Now i will play with link aggregation and see how it goes, and of course i'm counting that incremental replication will be slower...but since the amount of data would be much less probably it will still deliver a good performance. Probably won't help at all because of the brain dead way link aggregation has to work. See Ordering of frames at http://en.wikipedia.org/wiki/Link_Aggregation_Control_Protocol#Link_Aggregation_Control_Protocol Arse, thanks for reminding me Richard! A single stream will only use one path in a LAG. Doesn't (Open)Solaris have the option of setting the aggregate up as a FEC or in roundrobin mode? Solaris does offer what the Wiki describes as L4 or port number based hashing. I'm not sure what FEC is, but when I asked, round-robin isn't available as preserving packet ordering wouldn't be easy (possible?) that way. -Kyle //Svein ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On 3/30/2010 2:44 PM, Adam Leventhal wrote: Hey Karsten, Very interesting data. Your test is inherently single-threaded so I'm not surprised that the benefits aren't more impressive -- the flash modules on the F20 card are optimized more for concurrent IOPS than single-threaded latency. Yes it would be interesting to see the Avg numbers for 10 or more clients (or jobs on one client) all performing that same test. -Kyle Adam On Mar 30, 2010, at 3:30 AM, Karsten Weiss wrote: Hi, I did some tests on a Sun Fire x4540 with an external J4500 array (connected via two HBA ports). I.e. there are 96 disks in total configured as seven 12-disk raidz2 vdevs (plus system, spares, unused disks) providing a ~ 63 TB pool with fletcher4 checksums. The system was recently equipped with a Sun Flash Accelerator F20 with 4 FMod modules to be used as log devices (ZIL). I was using the latest snv_134 software release. Here are some first performance numbers for the extraction of an uncompressed 50 MB tarball on a Linux (CentOS 5.4 x86_64) NFS-client which mounted the test filesystem (no compression or dedup) via NFSv3 (rsize=wsize=32k,sync,tcp,hard). standard ZIL: 7m40s (ZFS default) 1x SSD ZIL: 4m07s (Flash Accelerator F20) 2x SSD ZIL: 2m42s (Flash Accelerator F20) 2x SSD mirrored ZIL: 3m59s (Flash Accelerator F20) 3x SSD ZIL: 2m47s (Flash Accelerator F20) 4x SSD ZIL: 2m57s (Flash Accelerator F20) disabled ZIL: 0m15s (local extraction0m0.269s) I was not so much interested in the absolute numbers but rather in the relative performance differences between the standard ZIL, the SSD ZIL and the disabled ZIL cases. Any opinions on the results? I wish the SSD ZIL performance was closer to the disabled ZIL case than it is right now. ATM I tend to use two F20 FMods for the log and the two other FMods as L2ARC cache devices (although the system has lots of system memory i.e. the L2ARC is not really necessary). But the speedup of disabling the ZIL altogether is appealing (and would probably be acceptable in this environment). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharenfs option rw,root=host1 don't take effect
On 3/10/2010 3:27 PM, Robert Thurlow wrote: As said earlier, it's the string returned from the reverse DNS lookup that needs to be matched. So, to make a long story short, if you log into the server from the client and do who am i, you will get the host name you need for the share. Another test (for a server configured as a DNS client, LDAP would be different) is to run 'nslookup client-ip' (or the dig equivalent.) The name returned is the one that needs to be in the share config. -Kyle Rob T ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharemgr
dick hoogendijk wrote: glidic anthony wrote: I have a solution with use zfs set sharenfs=rw,nosuid zpool but i prefer use the sharemgr command. Then you prefere wrong. To each their own. ZFS filesystems are not shared this way. They can be. I do it all the time. There's nothing technical that dictates that sharemgr can't be used on ZFS filesystems. Just because ZFS provides an alternate way, that doesn't make it the only way, or even the 'one true way.' About the only advantage I can see of using zfs share, is inheritance. If you don't need that, then sharemgr is just as good, and there are cases where it may be simpler - For instance, I loopback mount many many ISO's, and need to use sharemgr to share those anyway, I find it much more convienent to manage all my shares in one place with one tool. If sharemgr could (optionally) manage inherited sharing on ZFS filesystems, then I think it'd be cleaner to suggest to users to use the one system-wide sharing tool, rather that one that only works for one filesystem. I can't remember them right now, but I think there are other commands where ZFS seems to have done the same thing and I can't figure out why that's the trend? As great as ZFS is, it won't ever be the only filesystem around, ISOs (at least) will be around for a long time still. Why start forcing users to learn new tools for each filesystem type? Read up on ZFS and NFS. What make you think he didn't? While the docs do describe how you can optionally use zfs share (which he clearly read about since he mentioned it) they don't prohibit using sharemgr. I read his question as How can I get sharemgr to setup sharing so that it get inherited on child filesystems? Apparently the answer to that question is You can't. If you want to set it up only once you need zfs share, and if you really want to use sharemgr you need to share each filesystem separately. Maybe someday that will change. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS directory and file quota
Darren J Moffat wrote: Jozef Hamar wrote: Hi all, I can not find any instructions on how to set the file quota (i.e. maximum number of files per filesystem/directory) or directory quota (maximum size that files in particular directory can consume) in ZFS. That is because it doesn't exist. I understand ZFS has no support for this. Am I right? If I am, are there any plans to include this in the next releases of OpenSolaris/Solaris? Why would you want to do that rather than set a maximum amount of space a filesystem, user or group can consume? Last I checked NetApp had a 'directory quota' concept, but I don't know if it could be used on just any directory, or only on upper level directories. Granted, with ZFS you can just make any directory at any level a new FS, and get the same effect, but that can be heavywieght, and have undesired side effects. What is the real problem you are trying to solve by restricting the number of files that can be created ? I imagine it's one thta was previously solved with older unix/ufs file quotas. Though I can't imagine a use for that now that lack of inodes is not likely to be a problem any time soon. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] More Dedupe Questions...
Hi Darren, More below... Darren J Moffat wrote: Tristan Ball wrote: Obviously sending it deduped is more efficient in terms of bandwidth and CPU time on the recv side, but it may also be more complicated to achieve? A stream can be deduped even if the on disk format isn't and vice versa. Is the send dedup'ing more efficient if the filesystem is already depdup'd? If both are enabled do they share anything? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss gone from web?
Jacob Ritorto wrote: With the web redesign, how does one get to zfs-discuss via the opensolaris.org website? Sorry for the ot question, but I'm becoming desperate after clicking circular links for the better part of the last hour :( You can get the web pages to load? All I get are The connection has timed out. The server at opensolaris.org is taking too long to respond. Something is messed up. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Apple cans ZFS project
David Magda wrote: On Oct 24, 2009, at 08:53, Joerg Schilling wrote: The article that was mentioned a few hours ago did mention licensing problems without giving any kind of evidence for this claim. If there is evidence, I would be interested in knowing the background, otherwise it looks to me like FUD. I'm guessing that you'll never see direct evidence given the sensitivity that these negotiations can take. All you'll guess is rumours and leaks of various levels of reliability. Apple can currently just take the ZFS CDDL code and incorporate it (like they did with DTrace), but it may be that they wanted a private license from Sun (with appropriate technical support and indemnification), and the two entities couldn't come to mutually agreeable terms. Indemnification, I think reakky could have been a sticking point. I beleive that the NetApp - Sun Legal disputes are still working their way through the legal process. If I were Apple I would have wanted some protection in the case Sun loses the case. I don't think I'd want to be target #2 with precedent already set. That said, from what I've read, I don't beleive NetApp has a leg to stand on But then again I'm not a lawyer. ;) -Kyle Oh well. I'm sure Apple can come up something good in the FS team, but it's a shame that the wheel has to be re-invented when there's a production-ready option available. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving files from one fs to another, splittin/merging
Mike Bo wrote: Once data resides within a pool, there should be an efficient method of moving it from one ZFS file system to another. Think Link/Unlink vs. Copy/Remove. Here's my scenario... When I originally created a 3TB pool, I didn't know the best way carve up the space, so I used a single, flat ZFS file system. Now that I'm more familiar with ZFS, managing the sub-directories as separate file systems would have made a lot more sense (seperate policies, snapshots, etc.). The problem is that some of these directories contain tens of thousands of files and many hundreds of gigabytes. Copying this much data between file systems within the same disk pool just seems wrong. I hope such a feature is possible and not too difficult to implement, because I'd like to see this capability in ZFS. Alternatively, (and I don't know if this is feasible,) it might be easier and/or better to be able to set those properties on, and independently snapshot regular old sub directories. Just an idea -Kyle Regards, mikebo ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS port to Linux
Bob Friesenhahn wrote: On Fri, 23 Oct 2009, Anand Mitra wrote: One of the biggest questions around this effort would be “licensing”. As far as our understanding goes; CDDL doesn’t restrict us from modifying ZFS code and releasing it. However GPL and CDDL code cannot be mixed, which implies that ZFS cannot be compiled into Linux Kernel which is GPL. But we believe the way to get around this issue is to build ZFS as a module with a CDDL license, it can still be loaded in the Linux kernel. Though it would be restricted to use the non-GPL symbols, but as long as that rule is adhered to there is no problem of legal issues. The legal issues surrounding GPLv2 is what constitutes the Program and work based on the Program. In the case of Linux, the Program is usually the Linux kernel, and things like device drivers become a work based on the Program. Conjoining of source code is not really the issue. The issue is what constitutes the Program. About 10 years ago I had a long discussion with RMS and the (presumably) injured party related to dynamically loading a module linked to GPLv2 code into our application. RMS felt that loading that module caused the entire work to become a work based on the Program while I felt that the module was the work based on the Program but that the rest of our application was not since that module could be deleted without impact to the application. Regardless, it has always seemed to me that (with sufficient care), a loadable module can be developed which has no linkages to other code, yet can still be successfully loaded and used. In this case it seems that the module could be loaded into the Linux kernel without itself being distributed under GPL terms. Disclaimer: I am not a lawyer, nor do I play one on TV. I could be very wrong about this. Along these lines, it's always struck me that most of the restrictions of the GPL fall on the entity who distrbutes the 'work' in question. I would thinkthat distributing the source to a separate original work for a module, leaves that responsibility up to who-ever compiles it and loads it. This means the end-users, as long as they never distribute what they create, are (mostly?) unaffected by the Kernel's GPL, and if they do distribute it, the burden is on them. Arguably that line might even be shifted from the act of compiling it, to the act of actually loading (linking) it into the Kernel, so that distributing a compiled module might even work the same way. I'm not so sure about this though. Presumably compiling it before distribution would require the use of include files from the kernel, and that seems a grey area to me. Maybe clean room include files could be created? -Kyle Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Export, Import = Windows sees wrong groups in ACLs
Owen Davies wrote: Thanks. I took a look and that is exactly what I was looking for. Of course I have since just reset all the permissions on all my shares but it seems that the proper way to swap UIDs for users with permissions on CIFS shares is to: Edit /etc/passwd Edit /var/smb/smbpasswd And to change GIDs for groups used on CIFS shares you need to both: Edit /etc/group Edit /var/smb/smbgroup.db Is there a better way to do this than manually editing each file (or db)? I've just started reading the CIFS docs recently, so I could be wrong But I think the smb files were populated when you added the mappings (back when /etc/passwd and /etc/group were wrong.) I bet, if you removed the mappings, fixed the UNIX files, and recreated the mappings then the SMB files would be 'fixed'. It may not be easier, but it probably is better in the case that there are other housekeeping things the map commands do. -Kyle I don't think there is much of this sort of integration yet so that tools update things in a consistent way on both the UNIX side and the CIFS side. Thanks, Owen Davies ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pulsing write performance
Scott Meilicke wrote: I am still not buying it :) I need to research this to satisfy myself. I can understand that the writes come from memory to disk during a txg write for async, and that is the behavior I see in testing. But for sync, data must be committed, and a SSD/ZIL makes that faster because you are writing to the SSD/ZIL, and not to spinning disk. Eventually that data on the SSD must get to spinning disk. But the txg (which may contain more data than just the sync data that was written to the ZIL) is still written from memory. Just because the sync data was written to the ZIL, doesn't mean it's not still in memory. -Kyle To the books I go! -Scott ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
Adam Sherman wrote: On 6-Aug-09, at 11:32 , Thomas Burgess wrote: i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. This product looks really interesting: http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp But I can't confirm it will show both cards as separate disks… My read is that it won't (which is supported by the single SATA data connector,) but it will do the mirroring for you. I know that I generally prefer to let XFS handle the redundancy for me, but for you it may be enough to let this do the mirroring for the root pool. It seems too expensive to get 2. Do they have a cheaper one that takes only 1 CF card? -Kyle A. -- Adam Sherman CTO, Versature Corp. Tel: +1.877.498.3772 x113 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool Layout Advice Needed
Adam Sherman wrote: On 6-Aug-09, at 11:50 , Kyle McDonald wrote: i've seen some people use usb sticks, and in practice it works on SOME machines. The biggest difference is that the bios has to allow for usb booting. Most of todays computers DO. Personally i like compact flash because it is fairly easy to use as a cheap alternative to a hard drive. I mirror the cf drives exactly like they are hard drives so if one fails i just replace it. USB is a little harder to do that with because they are just not as consistent as compact flash. But honestly it should work and many people do this. This product looks really interesting: http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp But I can't confirm it will show both cards as separate disks… My read is that it won't (which is supported by the single SATA data connector,) but it will do the mirroring for you. Turns out the FAQ page explains that it will not, too bad. I know that I generally prefer to let ZFS handle the redundancy for me, but for you it may be enough to let this do the mirroring for the root pool. I'm with you there. It seems too expensive to get 2. Do they have a cheaper one that takes only 1 CF card? I just ordered a pair of the Syba units, cheap enough too test out anyway. Oh. I was looking and if you have an IDE socket, this will do separate master/slave devices: (no IDE cable needed, it plugs right into the MB - There's another that uses a cable if you prefer.) http://www.addonics.com/products/flash_memory_reader/adeb44idecf.asp And 2 of these (which look remarkably like the Syba ones) would work too: http://www.addonics.com/products/flash_memory_reader/adsahdcf.asp They're only 30 each so 2 of those are less than the dual one. -Kyle Now to find some reasonably priced 8GB CompactFlash cards… Thanks, A. -- Adam Sherman CTO, Versature Corp. Tel: +1.877.498.3772 x113 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
Martin wrote: C, I appreciate the feedback and like you, do not wish to start a side rant, but rather understand this, because it is completely counter to my experience. Allow me to respond based on my anecdotal experience. What's wrong with make a new pool.. safely copy the data. verify data and then delete the old pool.. You missed a few steps. The actual process would be more like the following. 1. Write up the steps and get approval from all affected parties -- In truth, the change would not make it past step 1. Maybe, but maybe not see below... 2. Make a new pool 3. Quiesce the pool and cause a TOTAL outage during steps 4 through 9 That's not entirely true. You can use ZFS send/recv to do the major first pass of #4 (and #5 against the snapshot) Live before the total outage. Then after you quiesce everything, you could use an incremental send/recv copy the changes since then quickly, reducing down time. I'd probably run a second full verify anyway, but in theory, I beleive the ZFS checksums are used in the send/recv process to ensure that there isn't any corruption, so after enough positive experience, I might start to skip the second verify. This should greatly reduce the length of the down time. Everyone. and then one day [months or years later] wants to shrink it... Business needs change. Technology changes. The project was a pilot and canceled. The extended pool didn't meet verification requirements, e,g, performance and the change must be backed out. In an Enterprise, a change for performance should have been tested on another identical non-production system before being implemented on the production one. I'd have to concur there's more useful things out there. OTOH... That's probably true and I have not seen the priority list. I was merely amazed at the number of Enterprises don't need this functionality posts. All that said, as a personal home user, this is a feature I'm hoping for all the time. :) -Kyle Thanks again, Marty ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
Jacob Ritorto wrote: Is this implemented in OpenSolaris 2008.11? I'm moving move my filer's rpool to an ssd mirror to free up bigdisk slots currently used by the os and need to shrink rpool from 40GB to 15GB. (only using 2.7GB for the install). Your best bet would be to install the new ssd drives, create a new pool, snapshot the exisitng pool and use ZFS send/recv to migrate the data to the new pool. There are docs around about how install grub and the boot blocks on the new devices also. After that remove (export!, don't destroy yet!) the old drives, and reboot to see how it works. If you have no problems, (and I don't think there's anything technical that would keep this from working,) then you're good. Otherwise put the old pool back in. :) -Kyle thx jake ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sol10u7: can't zpool remove missing hot spare
Will Murnane wrote: I'm using Solaris 10u6 updated to u7 via patches, and I have a pool with a mirrored pair and a (shared) hot spare. We reconfigured disks a while ago and now the controller is c4 instead of c2. The hot spare was originally on c2, and apparently on rebooting it didn't get found. So, I looked up what the new name for the hot spare was, then added it to the pool with zpool add home1 spare c4t19d0. I then tried to remove the original name for the hot spare: r...@box:~# zpool remove home1 c2t0d8 r...@box:~# zpool status home1 pool: home1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM home1ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t17d0 ONLINE 0 0 0 c4t24d0 ONLINE 0 0 0 spares c2t0d8 UNAVAIL cannot open c4t19d0AVAIL errors: No known data errors So, how can I convince the pool to release its grasp on c2t0d8? Have you tried making a sparse file with mkfile in /tmp and then ZFS replace'ing c2t0d8 with the file, and then zfs remove'ing the file? I don't know if it will work, but at least at the time of the remove, the device will exist. -Kyle Thanks! Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supported Motherboard SATA controller chipsets?
Volker A. Brandt wrote: I'm currently trying to decide between a MB with that chipset and another that uses the nVidia 780a and nf200 south bridge. Is the nVidia SATA controller well supported? (in AHCI mode?) Be careful with nVidia if you want to use Samsung SATA disks. There is a problem with the disk freezing up. This bit me with our X2100M2 and X2200M2 systems. I don't know if it's related to your issue, but I have also seen comments around about the nv-sata windows drivers hanging up when formatting drives than 1024GB. But that's been fixed in the latest nvidia windows drivers. Does that sound related, or like something different? -Kyle Regards -- Volker ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Supported Motherboard SATA controller chipsets?
Hi all, I think I've read that the AMD 790FX/750SB chipset's SATA controller is upported, but may have recently had bugs? I'm currently trying to decide between a MB with that chipset and another that uses the nVidia 780a and nf200 south bridge. Is the nVidia SATA controller well supported? (in AHCI mode?) At the moment I'm leaning toward that MB (ASUS M3N-HT) since it seems to still be available. Where as the AMD one (ASUS M3A79-T) seems harder to find. There is the ASUS M4A79T which is almost the same board, but it has 1 less SATA port - Which is also the reason I'm not looking at the M4N82 nVidia board. I wanted to run all this through something like the driver detection tool, but since I haven't bought the boards yet, that's kind of tough. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
dick hoogendijk wrote: On Fri, 31 Jul 2009 18:38:16 +1000 Tristan Ball tristan.b...@leica-microsystems.com wrote: Because it means you can create zfs snapshots from a non solaris/non local client... Like a linux nfs client, or a windows cifs client. So if I want a snapshot of i.e. rpool/export/home/dick I can do a zfs snapshot rpool/export/home/dick, But your command requires that it be run on the NFS/CIFS *server* directly. The 'mkdir' command version can be run on the server or on any NFS or CIFS client. It's possible (likely even) that regular users would not be allowed to login to server machines, but if given the right access, they can still use the mkdir version to create their own snapshots from a client. but what is the exact syntax for the same snapshot using this other method? As I understand it, if rpool/export/home/dick is mounted on /home/dick, then the syntax would be cd /home/dick/.zfs/snapshot mkdir mysnapshot -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] shrinking a zpool - roadmap
Ralf Gans wrote: Jumpstart puts a loopback mount into the vfstab, and the next boot fails. The Solaris will do the mountall before ZFS starts, so the filesystem service fails and you have not even an sshd to login over the network. This is why I don't use the mountpoint settings in ZFS. I set them all to 'legacy', and put them in the /etc/vfstab myself. I keep many .ISO files on a ZFS filesystem, and I LOFI mount them onto subdirectories of the same ZFS tree, and then (since they are for Jumpstart) loop back mount parts of eacch of the ISO's into /tftpboot When you've got to manage all this other stuff in /etc/vfstab ayway, it's easier to manage ZFS there too. I don't see it as a hardship, and I don't see the value of doing it in ZFS to be honest (unless every filesystem you have is in ZFS maybe.) The same with sharing this stuff through NFS. I since the LOFI mounts are separate filesystems, I have to share them with share (or sharemgr) and it's easier to share the ZFS diretories through those commands at the same time. I must be missing something, but I'm not sure I get the rationale behind duplicating all this admin stuff inside ZFS. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
Andriy Gapon wrote: What do you think about the following feature? Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. Right now I see this as being mostly useful for /home. Main benefit in this case is that various user administration tools can work unmodified and do the right thing when an administrator wants a policy of a separate fs per user But I am sure that there could be other interesting uses for this. But now that quotas are working properly, Why would you want to continue the hack of 1 FS per user? I'm seriously curious here. In my view it's just more work. A more cluttered zfs list, and share output. A lot less straight forward and simple too. Why bother? What's the benefit? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] feature proposal
Darren J Moffat wrote: Kyle McDonald wrote: Andriy Gapon wrote: What do you think about the following feature? Subdirectory is automatically a new filesystem property - an administrator turns on this magic property of a filesystem, after that every mkdir *in the root* of that filesystem creates a new filesystem. The new filesystems have default/inherited properties except for the magic property which is off. Right now I see this as being mostly useful for /home. Main benefit in this case is that various user administration tools can work unmodified and do the right thing when an administrator wants a policy of a separate fs per user But I am sure that there could be other interesting uses for this. But now that quotas are working properly, Why would you want to continue the hack of 1 FS per user? hack ? Different usage cases! Why bother? What's the benefit? The benefit is that users can control their own snapshot policy, they can create and destroy their own sub datasets, send and recv them etc. We can also delegate specific properties to users if we want as well. This is exactly how I have the builds area setup on our ONNV build machines for the Solaris security team.Sure the output of zfs list is long - but I don't care about that. I can imagine a use for a builds. 1 FS per build - I don't know. But why link it to the mkdir? Why not make the build scripts do the zfs create out right? When encryption comes along having a separate filesystem per user is an useful deployment case because it means we can deploy with separate keys for each user (granted may be less interesting if they only access their home dir over NFS/CIFS but still useful). I have a prototype PAM module that uses the users login password as the ZFS dataset wrapping key and keeps that in sync with the users login password on password change. Encryption is an interesting case. User Snapshots I'd need to think about more. Couldn't the other properties be delegated on directories? Maybe I'm just getting old. ;) I still think having the zpool not automatically include a filesystem, and having ZFS containers was a useful concept. And I still use share (and now sharemgr) to manage my shares, and not ZFS share. Oh well. :) -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD's and ZFS...
Tristan Ball wrote: It just so happens I have one of the 128G and two of the 32G versions in my drawer, waiting to go into our DR disk array when it arrives. Hi Tristan, Just so I can be clear, What model/brand are the drives you were testing? -Kyle I dropped the 128G into a spare Dell 745 (2GB ram) and used a Ubuntu liveCD to run some simple iozone tests on it. I had some stability issues with Iozone crashing however I did get some results... Attached are what I've got. I intended to do two sets of tests, one for each of sequential reads, writes, and a random IO mix. I also wanted to do a second set of tests, running a streaming read or streaming write in parallel with the random IO mix, as I understand many SSD's have trouble with those kind of workloads. As it turns out, so did my test PC. :-) I've used 8K IO sizes for all the stage one tests - I know I might get it to go faster with a larger size, but I like to know how well systems will do when I treat them badly! The Stage_1_Ops_thru_run is interesting. 2000+ ops/sec on random writes, 5000 on reads. The Streaming write load and random over writes were started at the same time - although I didn't see which one finished first, so it's possible that the stream finished first and allowed the random run to finish strong. Basically take these numbers with several large grains of salt! Interestingly, the random IO mix doesn't slow down much, but the streaming writes are hurt a lot. Regards, Tristan. -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of thomas Sent: Friday, 24 July 2009 5:23 AM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] SSD's and ZFS... I think it is a great idea, assuming the SSD has good write performance. This one claims up to 230MB/s read and 180MB/s write and it's only $196. http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 Compared to this one (250MB/s read and 170MB/s write) which is $699. Are those claims really trustworthy? They sound too good to be true! MB/s numbers are not a good indication of performance. What you should pay attention to are usually random IOPS write and read. They tend to correlate a bit, but those numbers on newegg are probably just best case from the manufacturer. In the world of consumer grade SSDs, Intel has crushed everyone on IOPS performance.. but the other manufacturers are starting to catch up a bit. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The importance of ECC RAM for ZFS
Michael McCandless wrote: I've read in numerous threads that it's important to use ECC RAM in a ZFS file server. My question is: is there any technical reason, in ZFS's design, that makes it particularly important for ZFS to require ECC RAM? I think, basically the idea is, that if you're going to use ZFS to protect your data from this sort of thing through the path to the stable storage, then it seems like a shame (or a waste?) not to equally protect the data both before it's given to ZFS for writing, and after ZFS reads it back and returns it to you. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog writing patterns vs SSD tech. (was SSD's and ZFS...)
Bob Friesenhahn wrote: Of course, it is my understanding that the zfs slog is written sequentially so perhaps this applies instead: Actually, reading up on these drives I've started to wonder about the slog writing pattern. While these drives do seem to do a great job at random writes, most of the promise shows at sequential writes, so Does the slog attempt to write sequentially through the space given to it? Also there are all sorts of analysis out there about how the drives always attempt to write new data to the pages and blocks they know are empty since they can't overwrite one page (usually 4k) without erasing the whole (512k) block the page is in. This leads to a drop in write performance after all the space (both the space you paid for, and any extra space the vendor putin to work around this issue) has been used once. This shows up in regular filesystems because when a file is deleted the drive only sees a new (over)write of some meta-data so the OS can record that the file is gone, but the drive is never told that the blocks the file was occupying are now free and can be pre-erased at the drives convience. The Drive vendors have come up with a new TRIM command, which some OS's (Win7) are talking about supporting in their Filesystems. Obviously for use only as an sLog device ZFS itself doesn't need (until people start using SSD's as regular pool devices) to know how to use TRIM, but I would think that the slog code would need to use it in order to keep write speeds up and latencies down. No? If so, what's the current concensus, thoughts, plans, etc. on if and when TRIM will be usable in Solaris/ZFS? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog writing patterns vs SSD tech.
Miles Nordin wrote: km == Kyle McDonald kmcdon...@egenera.com writes: km hese drives do seem to do a great job at random writes, most km of the promise shows at sequential writes, so Does the slog km attempt to write sequentially through the space given to it? thwack NO! Everyone who is using the code, writing the code, and building the systems says, io/s is the number that matters. If you've got some experience otherwise, fine, odd things turn up all the time. but AFAICT the consensus is clear right now. Yeah I know. I get it. I screwed up and used the the wrong term. OK? I agree with you. Still when all the previously erased pages are gone, write latencies go up (drastically - in some cases worse than a spinning HD,) and io/s goes down. So what I really wanted to get into was the question below. km they can't overwrite one page (usually 4k) without erasing the km whole (512k) block the page is in. don't presume to get into the business of their black box so far. I'm not. Guys like this are: http://www.anandtech.com/storage/showdoc.aspx?i=3531p=8 That's almost certainly not what they do. They probably do COW like ZFS and (yaffs and jffs2 and ubifs), so they will do the 4k writes to partly-empty pages until the page is full. In the background a gc thread will evacuate and rewrite pages that have become spattered with unreferenced sectors. That's where the problem comes in. They have no knowledge of the upper filesystem, and don't know what previously written blocks are still referenced. When the OS FS rewrites a directory to remove a pointer to the string of blocks the file used to use, and updates it's list of which LBA sectors are now free vs. in use, it probably happens pretty much exactly like you say. But that doesn't let the SSD mark the sectors the file used as unreferenced, so the gc thread can't evacuate them ahead of time and add them to the empty page pool. km The Drive vendors have come up with a new TRIM command, which km some OS's (Win7) are talking about supporting in their km Filesystems. this would be useful for VM's with thin-provisioned disks, too. True. Keeping or Putting the 'holes' back in the 'holey' disk files when the VM frees up space would be very useful. km I would think that the slog code would need to use it in order km to keep write speeds up and latencies down. No? read the goofy gamer site review please. No, not with the latest intel firmware, it's not needed. I did read at least one review that compared old and new firmware on the Intel M model. In that I'm pretty sure they still saw a performance hit (in latency) when the entire drive had been written to. It may have taken longer to hit, and it may have not been as drastic but it was still there. Which review are you talking about? So what if Intel has fixed it. Not everyone is going to use the intel drives. If the TRIM command (assuming it can help at all) can keep the other brands and models performing close to how they performed when new, then I'd say it's useful in the ZFS slogs too - Just because one vendor might have made it unnecessary, doesn't mean it is for everyone. Does it? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD's and ZFS...
F. Wessels wrote: Thanks posting this solution. But I would like to point out that bug 6574286 removing a slog doesn't work still isn't resolved. A solution is under it's way, according to George Wilson. But in the mean time, IF something happens you might be in a lot of trouble. Even without some unfortunate incident you cannot for example export your data pool, pull the drives and leave the root pool. In my case the slog slice wouldn't be the slog for the root pool, it would be the slog for a second data pool. If the device went bad, I'd have to replace it, true. But if the device goes bad, then so did a good part of my root pool, and I'd have to replace that too. Don't get me wrong I would like such a setup a lot. But I'm not going to implement it until the slog can be removed or the pool be imported without the slog. In the mean time can someone confirm that in such a case, root pool and zil in two slices and mirrored, that the write cache can be enabled with format? Only zfs is using the disk, but perhaps I'm wrong on this. There have been post's regarding enabling the write_cache. But I couldn't find a conclusive answer for the above scenario. When you have just the root pool on a disk, ZFS won't enable the write cache by default. I think you can manually enable it but I don't know the dangers. Adding the slog shouldn't be any different. To be honest, I don't know how closely the write caching on a SSD matches what a moving disk has. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD's and ZFS...
Brian Hechinger wrote: On Thu, Jul 23, 2009 at 10:28:38AM -0400, Kyle McDonald wrote: In my case the slog slice wouldn't be the slog for the root pool, it would be the slog for a second data pool. I didn't think you could add a slog to the root pool anyway. Or has that changed in recent builds? I'm a little behind on my SXCE versions, been too busy to keep up. :) I don't know either. It's not really what I was looking to do so I never even thought of it. :) When you have just the root pool on a disk, ZFS won't enable the write cache by default. I don't think this is limited to root pools. None of my pools (root or non-root) seem to have the write cache enabled. Now that I think about it, all my disks are hidden behind an LSI1078 controller so I'm not sure what sort of impact that would have on the situation. When you give the full disk (deivce name 'cWtXdY' - with no 'sZ' ) then ZFS will usually instruct the drive to enable write caching. You're right though if youre drives are really something like single drive RAID 0 LUNs, then who knows what happens. -Kyle -brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD's and ZFS...
Richard Elling wrote: On Jul 23, 2009, at 7:28 AM, Kyle McDonald wrote: F. Wessels wrote: Thanks posting this solution. But I would like to point out that bug 6574286 removing a slog doesn't work still isn't resolved. A solution is under it's way, according to George Wilson. But in the mean time, IF something happens you might be in a lot of trouble. Even without some unfortunate incident you cannot for example export your data pool, pull the drives and leave the root pool. In my case the slog slice wouldn't be the slog for the root pool, it would be the slog for a second data pool. If the device went bad, I'd have to replace it, true. But if the device goes bad, then so did a good part of my root pool, and I'd have to replace that too. Mirror the slog to match your mirrored root pool. Yep. That was the plan. I was just explaining that not being able to remove the slog wasn't an issue for me since I planned on always having that device available. I was more curious about whether there were any diown sides to sharing the SSD between the root pool and the slog? Thanks for the valuable input, Richard. -Kyle Don't get me wrong I would like such a setup a lot. But I'm not going to implement it until the slog can be removed or the pool be imported without the slog. In the mean time can someone confirm that in such a case, root pool and zil in two slices and mirrored, that the write cache can be enabled with format? Only zfs is using the disk, but perhaps I'm wrong on this. There have been post's regarding enabling the write_cache. But I couldn't find a conclusive answer for the above scenario. When you have just the root pool on a disk, ZFS won't enable the write cache by default. I think you can manually enable it but I don't know the dangers. Adding the slog shouldn't be any different. To be honest, I don't know how closely the write caching on a SSD matches what a moving disk has. Write caches only help hard disks. Most (all?) SSDs do not have volatile write buffers. Volatile write buffers are another bad thing you can forget when you go to SSDs :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD's and ZFS...
Richard Elling wrote: On Jul 23, 2009, at 9:37 AM, Kyle McDonald wrote: Richard Elling wrote: On Jul 23, 2009, at 7:28 AM, Kyle McDonald wrote: F. Wessels wrote: Thanks posting this solution. But I would like to point out that bug 6574286 removing a slog doesn't work still isn't resolved. A solution is under it's way, according to George Wilson. But in the mean time, IF something happens you might be in a lot of trouble. Even without some unfortunate incident you cannot for example export your data pool, pull the drives and leave the root pool. In my case the slog slice wouldn't be the slog for the root pool, it would be the slog for a second data pool. If the device went bad, I'd have to replace it, true. But if the device goes bad, then so did a good part of my root pool, and I'd have to replace that too. Mirror the slog to match your mirrored root pool. Yep. That was the plan. I was just explaining that not being able to remove the slog wasn't an issue for me since I planned on always having that device available. I was more curious about whether there were any diown sides to sharing the SSD between the root pool and the slog? I think it is a great idea, assuming the SSD has good write performance. This one claims up to 230MB/s read and 180MB/s write and it's only $196. http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 Compared to this one (250MB/s read and 170MB/s write) which is $699. Are those claims really trustworthy? They sound too good to be true! -Kyle -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD's and ZFS...
Kyle McDonald wrote: Richard Elling wrote: On Jul 23, 2009, at 9:37 AM, Kyle McDonald wrote: Richard Elling wrote: On Jul 23, 2009, at 7:28 AM, Kyle McDonald wrote: F. Wessels wrote: Thanks posting this solution. But I would like to point out that bug 6574286 removing a slog doesn't work still isn't resolved. A solution is under it's way, according to George Wilson. But in the mean time, IF something happens you might be in a lot of trouble. Even without some unfortunate incident you cannot for example export your data pool, pull the drives and leave the root pool. In my case the slog slice wouldn't be the slog for the root pool, it would be the slog for a second data pool. If the device went bad, I'd have to replace it, true. But if the device goes bad, then so did a good part of my root pool, and I'd have to replace that too. Mirror the slog to match your mirrored root pool. Yep. That was the plan. I was just explaining that not being able to remove the slog wasn't an issue for me since I planned on always having that device available. I was more curious about whether there were any diown sides to sharing the SSD between the root pool and the slog? I think it is a great idea, assuming the SSD has good write performance. This one claims up to 230MB/s read and 180MB/s write and it's only $196. http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 Compared to this one (250MB/s read and 170MB/s write) which is $699. Oops. Forgot the link: http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014 Are those claims really trustworthy? They sound too good to be true! -Kyle -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD's and ZFS...
Greg Mason wrote: I think it is a great idea, assuming the SSD has good write performance. This one claims up to 230MB/s read and 180MB/s write and it's only $196. http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393 Compared to this one (250MB/s read and 170MB/s write) which is $699. Oops. Forgot the link: http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014 Are those claims really trustworthy? They sound too good to be true! -Kyle Kyle- The less expensive SSD is an MLC device. The Intel SSD is an SLC device. That right there accounts for the cost difference. The SLC device (Intel X25-E) will last quite a bit longer than the MLC device. I understand that. That's why I picked that one to compare. It was my understanding that the MLC drives weren't even close performance wise to the SLC ones. This one seems pretty close. How can that be? -Kyle -Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD's and ZFS...
Adam Sherman wrote: In the context of a low-volume file server, for a few users, is the low-end Intel SSD sufficient? You're right, it supposedly has less than half the the write speed, and that probably won't matter for me, but I can't find a 64GB version of it for sale, and the 80GB version is over 50% more at $314. -Kyle A. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SSD's and ZFS...
I've started reading up on this, and I know I have alot more reading to do, but I've already got some questions... :) I'm not sure yet that it will help for my purposes, but I was considering buying 2 SSD's for mirrored boot devices anyway. My main question is: Can a pair of say 60GB SSD's be shared for both the root pool and as an SSD ZIL? Can the installer be configured to make the slice for the root pool to be something less than the whole disk? leaving another slice for the ZIL? Or would a zVOL in the root pool be a better idea? I doubt 60GB will leave enough space, but would doing this for the L2ARC be useful also? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Motherboard for home zfs/solaris file server
chris wrote: Thanks for your reply. What if I wrap the ram in a sheet of lead?;-) (hopefully the lead itself won't be radioactive) I've been looking at the same thing recently. I found these 4 AM3 motherboard with optional ECC memory support. I don't know whether this means ECC works, or ECC memory can be used but ECC will not. Do you? That's a good question. The ASUS specs definitely say unbuffered ECC memory is compatible, but until you mentioned it I never thought about whether the ECC functionality would actually be used. Asus M4N78 SE, Nvidia nForce 720D Chipset, 4xsata Asus M4N78-VM, Nvidia GeForce 8200 Chipset, 6xsata, onboard video Asus M4N82 Deluxe, NVIDIA nForce 980a Chipset, 6xsata Gigabyte GA-MA770T-UD3P, AMD 770 Chipset, 6xsata I hadn't located the Gigabyte board yet I'll have to look at that. The ASUS boards with the AMD chipsets (the models that start with M4A - like the M4A79T) are all true AM3 boards - they take DDR3 memory. All the nVidia chipset boards (even the 980a one) are AM2+/AM3 boards, and (as far as I know) only take DDR2 memory, but that may not matter to you since this will only be a server for you. The chipset isn't supposed to dictate the memory type that up to the CPU, but the MB does need to support it in other ways. DDR3 doesn't appear (in any reviews I've seen) to give much benefits with the current processors anyway. What I find more discouraging (since I'm trying to build a desktop/workstation) is that when you go to look for RAM the only ECC memory available (doesn't matter if it's DDR2 or 3) is rated much slower than what is available for non-ECC. For example you can find DD2 at 1066mhz, or even 1200mhz, but the fasted ECC DDR2 you can get is 800mhz. - It's cheap though, unless you want 4GB DIMMs then it's outrageous! The 2nd one looks the most promising, and GeForce 8200 seems somewhat supported by solaris except for sound(don't care) and network (can add another card. I don't see the the 1st or the 2nd one at usa.asus.com. The 3rd is the one I've been considering hard lately. In my searching the other brands don't seem to support ECC memeory at all. Another thing to remember is the expansion slots. You mentioned putting in a SATA controller for more drives, You'll want to make sure the board has a slot that can handle the card you want. If you're not using graphics then any board with a single PCI-E x16 slot should handle anything. But if you do put in a graphics board you'll want to look at what other slots are available. Not many consumer boards have PCI-X slots, and only some have PCI-E x4 slots. PCI-E x1 slots are getting scarce too. Most of the PCI-E SATA controlers I've seen want a slot at least x4, and many are x8. -Kye ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best controller card for 8 SATA drives ?
Erik Ableson wrote: Just a side note on the PERC labelled cards: they don't have a JBOD mode so you _have_ to use hardware RAID. This may or may not be an issue in your configuration but it does mean that moving disks between controllers is no longer possible. The only way to do a pseudo JBOD is to create broken RAID 1 volumes which is not ideal. It won't even let you make single drive RAID 0 LUNs? That's a shame. The lack of portability is disappointing. The trade-off though is battery backed cache if the card supports it. -Kyle Cordialement, Erik Ableson +33.6.80.83.58.28 Envoyé depuis mon iPhone On 23 juin 2009, at 04:33, Eric D. Mudama edmud...@bounceswoosh.org wrote: On Mon, Jun 22 at 15:46, Miles Nordin wrote: edm == Eric D Mudama edmud...@bounceswoosh.org writes: edm We bought a Dell T610 as a fileserver, and it comes with an edm LSI 1068E based board (PERC6/i SAS). which driver attaches to it? pciids.sourceforge.net says this is a 1078 board, not a 1068 board. please, be careful. There's too much confusion about these cards. Sorry, that may have been confusing. We have the cheapest storage option on the T610, with no onboard cache. I guess it's called the Dell SAS6i/R while they reserve the PERC name for the ones with cache. I had understood that they were basically identical except for the cache, but maybe not. Anyway, this adapter has worked great for us so far. snippet of prtconf -D: i86pc (driver name: rootnex) pci, instance #0 (driver name: npe) pci8086,3411, instance #6 (driver name: pcie_pci) pci1028,1f10, instance #0 (driver name: mpt) sd, instance #1 (driver name: sd) sd, instance #6 (driver name: sd) sd, instance #7 (driver name: sd) sd, instance #2 (driver name: sd) sd, instance #4 (driver name: sd) sd, instance #5 (driver name: sd) For this board the mpt driver is being used, and here's the prtconf -pv info: Node 0x1f assigned-addresses: 81020010..fc00..0100.83020014.. df2ec000..4000.8302001c. .df2f..0001 reg: 0002.....01020010....0100.03020014....4000.0302001c. ...0001 compatible: 'pciex1000,58.1028.1f10.8' + 'pciex1000,58.1028.1f10' + 'pciex1000,58.8' + 'pciex1000,58' + 'pciexclass,01' + 'pciexclass,0100' + 'pci1000,58.1028.1f10.8' + 'pci1000,58.1028.1f10' + 'pci1028,1f10' + 'pci1000,58.8' + 'pci1000,58' + 'pciclass,01' + 'pciclass,0100' model: 'SCSI bus controller' power-consumption: 0001.0001 devsel-speed: interrupts: 0001 subsystem-vendor-id: 1028 subsystem-id: 1f10 unit-address: '0' class-code: 0001 revision-id: 0008 vendor-id: 1000 device-id: 0058 pcie-capid-pointer: 0068 pcie-capid-reg: 0001 name: 'pci1028,1f10' --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS attributes for CIFS.
Hi all, I'm setting up a new fileserver, and while I'm not planning on enabling CIFS right away, I know I will in the future. I know there are several ZFS properties or attributes that affect how CIFS behaves. I seem to recall that at least one of those needs to be set early (like when the filesystem [or pool?] is created? Which properties might those be? Where can I find more info the CIFS/ZFS interaction? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
Bob Friesenhahn wrote: On Mon, 15 Jun 2009, Thommy M. wrote: In most cases compression is not desireable. It consumes CPU and results in uneven system performance. IIRC there was a blog about I/O performance with ZFS stating that it was faster with compression ON as it didn't have to wait for so much data from the disks and that the CPU was fast at unpacking data. But sure, it uses more CPU (and probably memory). I'll believe this when I see it. :-) With really slow disks and a fast CPU it is possible that reading data the first time is faster. However, Solaris is really good at caching data so any often-accessed data is highly likely to be cached and therefore read just one time. One thing I'm cuious about... When reading compressed data, is it cached before or after it is uncompressed? If before, then while you've save re-reading it from the disk, there is still (redundant) overhead for uncompressing it over and over. If the uncompressed data is cached, then I agree it sounds like a total win for read-mostly filesystems. -Kyle The main point of using compression for the root pool would be so that the OS can fit on an abnormally small device such as a FLASH disk. I would use it for a read-mostly device or an archive (backup) device. On desktop systems the influence of compression on desktop response is quite noticeable when writing, even with very fast CPUs and multiple cores. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression at zfs filesystem creation
Darren J Moffat wrote: Kyle McDonald wrote: Bob Friesenhahn wrote: On Mon, 15 Jun 2009, Thommy M. wrote: In most cases compression is not desireable. It consumes CPU and results in uneven system performance. IIRC there was a blog about I/O performance with ZFS stating that it was faster with compression ON as it didn't have to wait for so much data from the disks and that the CPU was fast at unpacking data. But sure, it uses more CPU (and probably memory). I'll believe this when I see it. :-) With really slow disks and a fast CPU it is possible that reading data the first time is faster. However, Solaris is really good at caching data so any often-accessed data is highly likely to be cached and therefore read just one time. One thing I'm cuious about... When reading compressed data, is it cached before or after it is uncompressed? The decompressed (and decrypted) data is what is cached in memory. Currently the L2ARC stores decompressed (but encrypted) data on the cache devices. So the cache saves not only the time to access the disk but also the CPU time to decompress. Given this, I think it could be a big win. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Much room for improvement for zfs destroy -r ...
Joep Vesseur wrote: All, I was wondering why zfs destroy -r is so excruciatingly slow compared to parallel destroys. SNIP while a little handy-work with # time for i in `zfs list | awk '/blub2\\// {print $1}'` ;\ do ( zfs destroy $i ) ; done yields real0m8.191s user0m6.037s sys 0m16.096s An 38.8 time improvement (at the cost of some extra CPU load) Why is there so much overhead in the sequential case? Or have I oversimplified the issues at hand with this simple test? One reason is that you're not timing how long it takes for the destroy's to complete. You're only timing how long it takes to start all the jobs in the background. -Kyle Joep ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE for two-level ZFS
On 2/20/2009 9:33 AM, Gary Mills wrote: On Thu, Feb 19, 2009 at 09:59:01AM -0800, Richard Elling wrote: Gary Mills wrote: Should I file an RFE for this addition to ZFS? The concept would be to run ZFS on a file server, exporting storage to an application server where ZFS also runs on top of that storage. All storage management would take place on the file server, where the physical disks reside. The application server would still perform end-to-end error checking but would notify the file server when it detected an error. Currently, this is done as a retry. But retries can suffer from cached badness. So, ZFS on the application server would retry the read from the storage server. This would be the same as it does from a physical disk, I presume. However, if the checksum failure persisted, it would declare an error. That's where the RFE comes in, because it would then notify the file server to utilize its redundant data source. Perhaps this could be done as part of the retry, using existing protocols. I'm no expert, but I think not only would this have been taken care of by the retry but if the error is being introduced by any HW or SW on the storage server's end, then the storage server will already be checking it's checksums. The main place the new errors could be introduced will be after the data left ZFS's control, heading out the network interface across the wires, and into the application server... While not impossible for the same error to creep in on every retry, I think it'd be rarer than different errors each time, and the retries would have a very good chance eventually getting good copies of every block. Even if the application server could notify the storage server of the problem. There isn't any thing more the storage server can do. If there was a problem that it's redundancy could fix, it's checksums would have identified that, and it would have fixed it even before the data was sent to the application server. There are several advantages to this configuration. One current recommendation is to export raw disks from the file server. Some storage devices, including I assume Sun's 7000 series, are unable to do this. Another is to build two RAID devices on the file server and to mirror them with ZFS on the application server. This is also sub-optimal as it doubles the space requirement and still does not take full advantage of ZFS error checking. Splitting the responsibilities works around these problems I'm not convinced, but here is how you can change my mind. 1. Determine which faults you are trying to recover from. I don't think this has been clearly identified, except that they are ``those faults that are only detected by end-to-end checksums''. Adding ZFS on the appserver will add a new set of checksums for the data's journey over the wire and back again. Nothing will be checking those checksums on the storage server to see if corruption happened to writes on the way there (which might be a place for improvement - but I'm not sure how that can even be done,) but those same checksums will be sent back to the appserver on a read, so the appserver will be able to determine the problem then - Of course if the corruption happenned while sending the write, then no amount of retries will help. Only ZFS redundancy on the app server can (currently) help with that. -Kyle 2. Prioritize these faults based on their observability, impact, and rate. Perhaps the project should be to extend end-to-end checksums in situations that don't have end-to-end redundancy. Redundancy at the storage layer would be required, of course. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 2/13/2009 5:58 AM, Ross wrote: huh? but that looses the convenience of USB. I've used USB drives without problems at all, just remember to zpool export them before you unplug. I think there is a subcommand of cfgaadm you should run to to notify Solariss that you intend to unplug the device. I don't use USB, and my familiarity with cfgadm (for FC and SCSI) is limited. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 2/10/2009 3:37 PM, D. Eckert wrote: (...) Possibly so. But if you had that ufs/reiserfs on a LVM or on a RAID0 spanning removable drives, you probably wouldn't have been so lucky. (...) we are not talking about a RAID 5 array or an LVM. We are talking about a single FS setup as a zpool over the entire available disk space on an external USB HDD. Ok then the parallel on linux would still be something like running reiserfs on a single disk LVM (which I think redhat still installs with by default?) And my real point is that with ZFS even though you only wany a single FS on a single disk, you can't treat it like the LVM/RAID level of software isn't there just because you only have one disk. It is still there, and you need to understand it's commands and how to use them when you want to diconnect the disk. I decided to do so due to the read/write speed performance of zfs comparing to UFS/ReiserFS. That's fine. If you have reasons to use a single disk that option is still available. Again that doesn't mean you can treat it like a FS on a raw device. -Kyle Regards, DE. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 2/10/2009 4:48 PM, Roman V. Shaposhnik wrote: On Wed, 2009-02-11 at 09:49 +1300, Ian Collins wrote: These posts do sound like someone who is blaming their parents after breaking a new toy before reading the instructions. It looks like there's a serious denial of the fact that bad things do happen to even the best of people on this thread. No one is denying that that can happen. However there are many things that were done here that increased the chance (or things that weren't done that could have decreased the chance) of this happenning. I'm not saying the OP should have known better. Everyone learns from mistakes. I'm just trying to explain to him both why what happenned might have happenned, and what he could have done that might have avoided it. Is it still possible that something like this could have happenned? sure. Should there be a better way to handle it when it does? you bet! -Kyle Thanks, Roman. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] where did my 400GB space go?
On 2/11/2009 12:11 PM, Bob Friesenhahn wrote: My understanding is that 1TB is the maximum bootable disk size since EFI boot is not supported. It is good that you were allowed to use the larger disk, even if its usable space is truncated. I don't dispute that, but I don't understand it either. If EFI is not being used (ZFS boot doesn't use EFI on the root pool since the BIOS doesn't (usually) uinderstand the EFI label) then what is it that has a 1TB limit? I beleive linux (and I'd guess NTFS) can use the whole disk past 1TB, so my guess is the old fashioned PC/DOS/FDisk partition tables can handle sizes over 1TB now (though I know they couldn't in the past.) Anyone know what the bottle neck is? -Kyle Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 2/11/2009 12:35 PM, Toby Thain wrote: On 11-Feb-09, at 11:19 AM, Tim wrote: ... And yes, I do keep checksums of all the data sitting on them and periodically check it. So, for all of your ranting and raving, the fact remains even a *crappy* filesystem like fat32 manages to handle a hot unplug without any prior notice without going belly up. By chance, certainly not design. Yep. I've never unplugged a USB drive on purpose, but I have left a drive plugged into the docking station, Hibernated windows XP professional, undocked the laptop, and then woken it up later undocked. It routinely would pop up windows saying that a 'delayed write' was not successful on the now missing drive. I've always counted myself lucky that any new data written to that drive was written long long before I hibernated, becuase have yet to find any problems with that data, (but I don't read it very often if at all.) But it is luck only! -Kyle --Toby --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] where did my 400GB space go?
On 2/11/2009 12:57 PM, Tomas Ögren wrote: On 11 February, 2009 - Kyle McDonald sent me these 1,2K bytes: On 2/11/2009 12:11 PM, Bob Friesenhahn wrote: My understanding is that 1TB is the maximum bootable disk size since EFI boot is not supported. It is good that you were allowed to use the larger disk, even if its usable space is truncated. I don't dispute that, but I don't understand it either. If EFI is not being used (ZFS boot doesn't use EFI on the root pool since the BIOS doesn't (usually) uinderstand the EFI label) then what is it that has a 1TB limit? SMI/VTOC, the original label (partition table format:ish) system used. EFI can use larger, but EFI tables for boot isn't supported right now. I guess you should be able to put the rpool on a 50GB slice or so, then put the other 1450GB in an EFI data pool.. Ok, So while the fdisk solaris partition could be made to use the whole disk, the solaris label/vtoc inside the solaris fdisk partition can only use 1TB of that. Since you can't mix EFI and FDisk partition tables, and you can't have more than one Solaris fdisk partition (that I'm aware of anyway) it looks like 1TB is all you can give Solaris at the moment. But you could give that other 400GB to some other OS or Filesystem I suppose. SInce EFI boot requires (IIRC) X86 HW vendors to improve the BIOS support, EFI boot isn't going to be useful for a while even if it appeared tomorrow. Is there any hope, or plan to improve/fix teh Solaris VTOC? -Kyle -Kyle It's just that you can't have the rpool1TB due to boot limits. /Tomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] where did my 400GB space go?
On 2/11/2009 1:03 PM, Kyle McDonald wrote: Since you can't mix EFI and FDisk partition tables, and you can't have more than one Solaris fdisk partition (that I'm aware of anyway) it looks like 1TB is all you can give Solaris at the moment. I should have qualified that with If you need to boot from it. Of course if you don't need to boot from it, Solaris can just put an EFI label on it and use the whole thing. If it were me I'd find some small drive to put the OS on and save that nice big drive for a second (non root) pool that can use the whole thing. Also since those drives are generally under $200 now, I'd probably pick up a second and mirror the 2. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] where did my 400GB space go?
On 2/11/2009 1:50 PM, Richard Elling wrote: Solaris can now (as of b105) use extended partitions. http://www.opensolaris.org/os/community/on/flag-days/pages/2008120301/ That's interesting, but I'm not sure how it helps. It's my understanding that Solaris doesn't like it if more than one of the fdisk partitions (primary or extended) are of type 'Solaris[2]' Has that changed? -Kyle -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 2/10/2009 2:50 PM, D. Eckert wrote: (..) Dave made a mistake pulling out the drives with out exporting them first. For sure also UFS/XFS/EXT4/.. doesn't like that kind of operations but only with ZFS you risk to loose ALL your data. that's the point! (...) I did that many times after performing the umount cmd with ufs/reiserfs filesystems on USB external drives. And they never complainted or got corrupted. Possibly so. But if you had that ufs/reiserfs on a LVM or on a RAID0 spanning removable drives, you probably wouldn't have been so lucky. Just because you only create a single ZFS filesystem inside your zpool, doesn't mean that when that single filesystem is unmounted it si safe to remove the drive. When you consider the extra layer of the zPool (like LVM or sw RAID) it's not surpriseing there are other things you have to do before you remove the disk. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 2/10/2009 2:54 PM, D. Eckert wrote: I disagree, see posting above. ZFS just accepts it 2 or 3 times. after that, your data are passed away to nirvana for no reason. And it should be legal, to have an external USB drive with a ZFS. with all respect, why should a user always care for redundancy, e. g. setup a mirror on a single HDD between the slices?? You don't have to have redundancy. But if you don't then I don't know how you can expect the 'repair' features of ZFS to bail you out when somethign bad happens. This reduces half your available space you have on your drive. Mirroring between slices does more than that. it' will ruin your performance also. It's be much better to set 'copies=2', though that will still reduce your space by half. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Hi Dave, Having read through the whole thread, I think there are several things that could all be adding to your problems. At least some of which are not related to ZFS at all. You mentioned the ZFS docs not warning you about this, and yet I know the docs explictly tell you that: 1. While a ZFS pool that has no redundancy (Mirroring or Parity,) like your's is missing, can still *detect* errors in the data read from the drive, it can't *repair* those errors. Repairing errors requires that ZFS be performing (at least) the (top-most level of) Mirroring or Parity functions. Since you have no Mirroring or Parity ZFS cannot automatically recover this data. 2. As others have said, a zpool can contain many filesystems. 'zfs umount' only unmounts a single filesystem. Removing a full pool from a machine requires a 'zpool export' no matter what disk technology is being used (USB, SCSI, SATA, FC, etc.) On the new system you would use 'zpool import' to bring the pool into the new system. I'm sure this next on is documented by Sun also though not in the ZFS docs, probably in some other part of the system dealing with removable devices: 3. In addition, according to Casper's message you need to 'off-line' USB (and probasbly other types too) storage in Solaris (Just like in Windows) before pulling the plug. This has nothing to do with ZFS. This will have corrupted (possibly even past the point of repair most other filesystems also. Still, I had an idea on something you might try. I don't know how long it's been since you pulled the drive, or what else you've done since. Which machine is reporting the errors you've shown us? The machine you pulled the drives from? or the machine you moved them too? Were you successful in 'zpool import' the pool into the other machines? This idea might work either way, but if you haven't successfully immported it into another machine there's probably more of a chance. If the output is from the machine you pulled them out of, then basically that machine still thinks the pool is connected to it, and it thinks the one and only disk in the pool is now not responding. In this case the errors you see in the tables are the errors from trying to contact a drive that no longer exists. Have you reconnected the disk to the original machine yet? If not I'd attempt a 'zpool export' now (though that may not work.) and then shut the machine down fully, and connect the disk. Then boot it all up. Depending on what you've tried to do with this disk to fix the problem since it happened I have no idea exactly how the machine will come up. If you couldn't do the 'zpool' export, then the machine will try to mount the FS's in the pool on boot. This may nor may not work. If you were successful in doing the export with the disks disconnected, then it won't try, and you'll need to 'zpool import' them after the machine is booted. Depending on how the import goes, you might still see errors in the 'zpool status' output. If so, I know a 'zpool clear' will clear those errors, and I doubt it can make the situation any worse than it is now. You'd have to give us info about what the machine tells you after this before I can advise you more. But (and the experts can correct me if I'm wrong) this might 'just work(tm)'. My theory here is that the ZFS may have been successful in keeping the state of the (meta)data on the disk consistent after all. The checksum and I/O errors listed may be from ZFS trying to access the non-existent drive after you removed it. Which (in theory) are all bogus errors, and don't really point to errors in the data on the drive. Of course there are many things that all have to be true for this theory to turn out to be true. Depending on what has happened to the machines and the disks since they were originally unplugged from each other, all bets might be off. And then there's the possibility that the my idea never could work at all. People much more expert than I can chime in on that. -Kyle D. Eckert wrote: Hi, after working for 1 month with ZFS on 2 external USB drives I have experienced, that the all new zfs filesystem is the most unreliable FS I have ever seen. Since working with the zfs, I have lost datas from: 1 80 GB external Drive 1 1 Terrabyte external Drive It is a shame, that zfs has no filesystem management tools for repairing e. g. being able to repair those errors: NAMESTATE READ WRITE CKSUM usbhdd1 ONLINE 0 0 8 c3t0d0s0 ONLINE 0 0 8 errors: Permanent errors have been detected in the following files: usbhdd1: 0x0 It is indeed very disappointing that moving USB zpools between computers ends in 90 % with a massive loss of data. This is to the not reliable working command zfs umount poolname, even if the output of mount shows you, that the pool is no longer mounted and ist removed from mntab. It works only 1
Re: [zfs-discuss] ZFS: unreliable for professional usage?
D. Eckert wrote: too many words wasted, but not a single word, how to restore the data. I have read the man pages carefully. But again: there's nothing said, that on USB drives zfs umount pool is not allowed. It is allowed. But it's not enough. You need to read both the 'zpool ' and 'zfs' manpages. the 'zpool' manpage will tell you that the way to move the 'whole pool' to another machine is to run 'zpool export poolname'. The 'zpool export' will actually run the 'zfs umount' for you, though it's not a problem if it's already been done. Note, this isn't USB specific, you won't see anything in the docs about USB. This condition applies to SCSI and others too. You need to export the pool to move it to another machine. If the machine crashed before you could export it, 'zpool import -f' on the new machine can help import it anyway. With USB, there are probably other commands you'll also need to use to notify Solaris that you are going to unplug the drive, Just like the 'Safely remove hardware' tool on windows. Or you need to remove it only when the system is shut down. These commands will be documented somewhere else, not in the ZFS docs because they don't apply to just ZFS. So how on earth should a simple user know that, if he knows that filesystems properly unmounted using the umount cmd?? You need to understand that the filesystems are all contained in a 'pool' (more than one filesystem can share the disk space in in the same pool). Unmounting the filesystem *does not* prepare the *pool* to be moved from one machine to another. And again: Why should a 2 weeks old Seagate HDD suddenly be damaged, if there was no shock, hit or any other event like that? Who knows? Some harddrives are manufactured with problems. Remember that ZFS is designed to catch problems that even the ECC on the drive doesn't catch. So it's not impossible for it to catch errors even the manufacturer's QA tests missed. It is of course easier to blame the stupid user instead of having proper documentation and emergency tools to handle that. I beleive that between the man pages, the administration docs on the web, the best practices pages, and all the other blogs and web pages, that ZFS is documented well enough. It's not like other filesystems, so there is more to learn, and you need to review all the docs, not just the ones that cover the operations (like unmount) that you're familiar with. Understanding pools (and the commands that manage pools,) is also important. Man pages and command references are good when you understand the architecture and need to learn about the details of a command you know you need to use. It's the other documentation that will fill you in you on how the system parts work together, and advise you on the best way to setup or do what you want. As I said in my other email ZFS can't repair errors without a way to reconstruct the data. It needs mirroring, parity (or the copies=x setting) to be able to repair the data. By setting up a pool with no redundancy. So your email subject line is a little backwards, since any 'professional' usage would incorporate redundancy (Mirror, Parity, etc.) What you're trying to do is more 'home/hobbiest' usage. Though most home/hobbiest users decide to incorporte redundancy for any data they really care about. The list of malfunctions of SNV Builts gets longer and longer with every version released. I'm sure new things are added every release, but many are also fixed. sNV is pre-release software after all. Overall the problems found aren't around long, and I beleive the list gets shorter as often as it gets longer. If you want production level Solaris, ZFS is available in solaris 10. e. g. on SNV 107 - installation script is unable to write properly the boot blocks for grub - you choose German locale, but have an American Keyboard style in the gnome (since SNV 103) - in SNV 107 adding these lines to xorg.conf: Option XkbRules xorg Option XkbModel pc105 Option XkbLayout de (was working in SNV 103) lets crash the Xserver. - latest Nvidia Driver (Vers. 180) for GeForce 8400M doesn't work with OpenSolaris SNV 107 - nwam and iwk0: not solved, no DHCP responses Yes there was a major update of the X server sources to catch up to the latest(?) X.org release. Workarounds are known, and I bet this will be working again in b108 (or not long after.) it seems better, to stay focused on having a colourfull gui with hundreds of functions no one needs instead providing a stable core. The core of solaris is much more stable than anythign else I've used. The windowing system is not a part of the core of an operatinog system in my book. I am looking forward the day booting OpenSolaris and see a greeting Windows XP Logo surrounded by the blue bubbles of OpenSolaris. roll-eyes Note that sNV (aka SXCE - or Solaris eXpress Community Edition)
[zfs-discuss] Should I report this as a bug?
I jumpstarted my machine with sNV b106, and installed with ZFS root/boot. It left me at a shell prompt in the JumpStart environment, with my ZFS root on /a. I wanted to try out some things that I planned on scripting for the JumpStart to run, one of these waas creating a new ZFS pool from the remaining disks. I looked at the zpool create manpage, and saw this it had a -R altroot option, and the exact same thing had just worked for me with 'dladm aggr-create' so I thought I'd give that a try. If the machine had been booted normally, my ZFS root would have been /, and a 'zpool create zdata0 ...' would have defaulted to mounting the new pool as /zdata0 right next to my ZFS root pool /zroot0. So I expected 'zpool create -R /a zdata0 ...' to set the default mountpoint for the pool to /zdata0 with a temporary altroot=/a. I gave it a try, and while it created the pool it failed to mount it at all. It reported that /a wasn't empty. 'zpool list', and 'zpool get all' show the altroot=/a. But 'zfs get all zdata0' shows the mountpoint=/a also, not the default of /zdata0. Am I expecting the wrong thing here? or is this a bug? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mount race condition?
On 1/28/2009 12:16 PM, Nicolas Williams wrote: On Wed, Jan 28, 2009 at 09:07:06AM -0800, Frank Cusack wrote: On January 28, 2009 9:41:20 AM -0600 Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Tue, 27 Jan 2009, Frank Cusack wrote: i was wondering if you have a zfs filesystem that mounts in a subdir in another zfs filesystem, is there any problem with zfs finding them in the wrong order and then failing to mount correctly? I have not encountered that problem here and I do have a multilevel mount heirarchy so I assume that ZFS orders the mounting intelligently. well, the thing is, if the two filesystems are in different pools (let me repeat the example): Then weird things happen I think. You run into the same problems if you want to mix ZFS and non-ZFS filesystems in a mount hierarchy. You end up having to set the mountpoint property so the mounts don't happen at boot and then write a service to mount all the relevant things in order. Or set them all to legacy, and put them in /etc/vfstab. That's what I do. I have a directory on ZFS that holds ISO images, and a peer directory that contains mountpoints for loopback mounts of all those ISO's. I set the ZFS to legacy, and then in /etc/vfstab I put the FS containing the ISO files before I list all the ISO's to be mounted. -Kyle Nico ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS filesystem creation during JumpStart
Brad Hudson wrote: Thanks for the response Peter. However, I'm not looking to create a different boot environment (bootenv). I'm actually looking for a way within JumpStart to separate out the ZFS filesystems from a new installation to have better control over quotas and reservations for applications that usually run rampant later. In particular, I would like better control over the following (e.g. the ability to explicitly create them at install time): Whether you want a bootenv or not, that command, and syntax) is the only way to specify to jumpstart to both use ZFS instead of UFS, and to customize how it's intalled (it's option to split out /var is, unfortunately, the only FS that can be split at the moment.) You're not the first to lament over this fact, but I wouldn't hold your breath for any improvements, since JumpStart is not really being actively improved anylonger. Sun is instead focusing on it's replacement 'AI', which is currently being developed and used on OpenSolaris, and I beleive is intended to replace JS on Sun Solaris at some undefined time in the future. At the moment I don't beleive that AI has the features you're looking for either - It has quite a few other differences from JS too, if you think you'll use it, you should keep tabs on the project pages, and mailing lists. rpool/opt - /opt rpool/usr - /usr rpool/var - /var rpool/home - /home Of the above /home can easily be created post-install, but the others need to have the flexibility of being explicitly called out in the JumpStart profile from the initial install to provide better ZFS accounting/controls. It's not hard to create /opt, or /var/xyz ZFS filesystems, and move files into them during post install, or first boot even, then mve the originals, and set the zfs mountpoints to where the originals are. This even give you the advantage of enabling compression (since all the data will be rewritten and thus compressed.) /usr is harder. Might not be impossible in a finish script, but probably much harder in a first-boot script. All that said, if you're planning on using live upgrade (or snap upgrade on OS) after installation is done, I'm not sure if they'll just 'Do the right thing' (or even work at all) with these other filesystems as they clone and upgrade the new BE's. My bet would be no. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs is a co-dependent parent and won't let children leave home
Tim Haley wrote: Ross wrote: While it's good that this is at least possible, that looks horribly complicated to me. Does anybody know if there's any work being done on making it easy to remove obsolete boot environments? If the clones were promoted at the time of their creation the BEs would stay independent and individually deletable. Promotes can fail, though, if there is not enough space. I was told a little while back when I ran into this myself on an Nevada build where ludelete failed, that beadm *did* promote clones. This thread appears to be evidence to the contrary. I think it's a bug, we should either promote immediately on creation, or perhaps beadm destroy could do the promotion behind the covers. If I understand this right, the latter option looks better to me. Why consume the disk space before you have to? What does LU do? -Kyle -tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Custom Jumpstart and RAID-10 ZFS rpool
Ian Collins wrote: Stephen Le wrote: Is it possible to create a custom Jumpstart profile to install Nevada on a RAID-10 rpool? No, simple mirrors only. Though a finish sscript could add additional simple mirrors to create the config his example would have created. Pretty sure that's still not RAID10 though. And any files laid down by the installer would be constrained to the first mirrored pair, only new files would have a chance at be distributed over the addtional pairs. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Custom Jumpstart and RAID-10 ZFS rpool
kristof wrote: I don't think this is possible. I already tried to add extra vdevs after install, but I got an error message telling me that multiple vdevs for rpool are not allowed. K Oh. Ok. Good to know. I always put all my 'data' diskspace in a separate pool anyway to make migration to another host easier, so I haven't actually tried it. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, NFS and Auto Mounting
Douglas R. Jones wrote: 4) I change the auto.ws map thusly: Integration chekov:/mnt/zfs1/GroupWS/ Upgradeschekov:/mnt/zfs1/GroupWS/ cstools chekov:/mnt/zfs1/GroupWS/ com chekov:/mnt/zfs1/GroupWS This is standard NFS behavior (prior to NFSv4). Child Filesystems have to be mounted on the NFS client explicitly. As someone else mentioned, NFSv4 has a feature called 'mirror-mounts' that is supposed to automate this for you. For now try this: Integration chekov:/mnt/zfs1/GroupWS/ Upgrades chekov:/mnt/zfs1/GroupWS/ cstools chekov:/mnt/zfs1/GroupWS/ com /chekov:/mnt/zfs1/GroupWS \ /Integration chekov:/mnt/zfs1/GroupWS/Integration Note the \ line continuation character. The last 2 lines are really all one line. If you had had 'Integration' on it's own ufs or ext2fs filesystem in the past, but still mounted below 'GroupWS' you would have seen this in the past. It's not a ZFS thing, or a Solaris thing. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dump and swap
Darren J Moffat wrote: John Cecere wrote: The man page for dumpadm says this: A given ZFS volume cannot be configured for both the swap area and the dump device. And indeed when I try to use a zvol as both, I get: zvol cannot be used as a swap device and a dump device My question is, why not ? Swap is a normal ZVOL and subject to COW, checksum, compression (and coming soon encryption). Would there be no performance benefits from having swap read/write from contiguous preallocated space also? I do realize that nifty features like encryption might be lost in that case, but Im wondering if there's any performance to be gained? Then again if you're concerned about performance you need to just buy ram till you stop swapping all together, huh? -Kyle Dump ZVOLs are preallocated contiguous space that are written to directly by the ldi_dump routines, they aren't written to by normal ZIO transactions, they aren't checksum'd - the compression is done by the dump layer not by ZFS. This is needed because when we are writing a crash dump we want as little as possible in IO the stack. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Greenbytes/Cypress
Richard Elling wrote: Bob Friesenhahn wrote: On Tue, 23 Sep 2008, Eric Schrock wrote: See: http://www.opensolaris.org/jive/thread.jspa?threadID=73740tstart=0 I must apologize for anoying everyone. When Richard Elling posted the GreenBytes link without saying what it was I completely ignored it. I assumed that it would be Windows-centric content that I can not view since of course I am a dedicated Solaris user. I see that someone else mentioned that the content does not work for Solaris users. As a result I ignored the entire discussion as being about some silly animation of gumballs. So you admit that you didn't grok it? :-) Dude poured in a big bag of gumballs, but they were de-duped, so the gumball machine only had a few gumballs. I won't admit I didn't grok it. I will admit however, (and this may be worse) that even though I do have a windows laptop, with QuickTime installed, I couldn't get the damn thing to work in Firefox. So I couldn't see it. -Kyle -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Error: value too large for defined data type
Paul Raines wrote: I am having a very odd problem on one of our ZFS filesystems On certain files, when accessed on the Solaris server itself locally where the zfs fs sits, we get an error like the following: [EMAIL PROTECTED] # ls -l ./README: Value too large for defined data type total 36 -rw-r- 1 mreuter mreuter 1019 Sep 25 2006 Makefile -rw-r- 1 mreuter mreuter 3185 Feb 22 2000 lcompgre.cc -rw-r- 1 mreuter mreuter 3238 Feb 22 2000 lcompgsh.cc -rw-r- 1 mreuter mreuter 2485 Feb 22 2000 lcompreg.cc -rw-r- 1 mreuter mreuter 2774 Feb 22 2000 lcompshf.cc Do you by chance have /usr/gnu/bin, or any directory with a Gnu 'ls' in your path before /usr/bin? (what does 'which ls' show?) I've seen this with Gnu ls that I have compiled myself as far back as Solaris 9 mayber earlier. By default Gnu ls compiled on solaris doesn't know how to handle latgr files (and therefore probably 64bit dates either.) When I've seen this, explicitly running /usr/bin/ls -l worked fine, and I suspect it will for you too. -Kyle The odd thing is that when the filesystem is accessed from our Linux boxes over NFS, there is no error access the same file vader:complex[84] ls -l total 24 drwxr-x---+ 2 mreuter mreuter8 Sep 25 2006 . drwxr-x---+ 5 mreuter mreuter5 Mar 31 1997 .. -rw-r-+ 1 mreuter mreuter 3185 Feb 22 2000 lcompgre.cc -rw-r-+ 1 mreuter mreuter 3238 Feb 22 2000 lcompgsh.cc -rw-r-+ 1 mreuter mreuter 2485 Feb 22 2000 lcompreg.cc -rw-r-+ 1 mreuter mreuter 2774 Feb 22 2000 lcompshf.cc -rw-r-+ 1 mreuter mreuter 1019 Sep 25 2006 Makefile -rw-r-+ 1 mreuter mreuter 1435 Jan 4 1945 README vader:mreuter:complex[85] wc README 40 181 1435 README The file is obvious small so this is not a large file problem. Anyone have an idea what gives? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Pools 1+TB
Daniel Rock wrote: Kenny schrieb: 2. c6t600A0B800049F93C030A48B3EA2Cd0 SUN-LCSM100_F-0670-931.01GB /scsi_vhci/[EMAIL PROTECTED] 3. c6t600A0B800049F93C030D48B3EAB6d0 SUN-LCSM100_F-0670-931.01MB /scsi_vhci/[EMAIL PROTECTED] Disk 2: 931GB Disk 3: 931MB Do you see the difference? Not just disk 3: AVAILABLE DISK SELECTIONS: 3. c6t600A0B800049F93C030D48B3EAB6d0 SUN-LCSM100_F-0670-931.01MB /scsi_vhci/[EMAIL PROTECTED] 4. c6t600A0B800049F93C031C48B3EC76d0 SUN-LCSM100_F-0670-931.01MB /scsi_vhci/[EMAIL PROTECTED] 8. c6t600A0B800049F93C031048B3EB44d0 SUN-LCSM100_F-0670-931.01MB /scsi_vhci/[EMAIL PROTECTED] This all makes sense now, since a RAIDZ (or RAIDZ2) vdev can only be as big as it's *smallest* component device. -Kyle Daniel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Pools 1+TB
Kenny wrote: How did you determine from the format output the GB vs MB amount?? Where do you compute 931 GB vs 932 MB from this?? 2. c6t600A0B800049F93C030A48B3EA2Cd0 /scsi_vhci/[EMAIL PROTECTED] 3. c6t600A0B800049F93C030D48B3EAB6d0 /scsi_vhci/[EMAIL PROTECTED] It's in the part you didn't cut and paste: AVAILABLE DISK SELECTIONS: 3. c6t600A0B800049F93C030D48B3EAB6d0 SUN-LCSM100_F-0670-931.01MB /scsi_vhci/[EMAIL PROTECTED] 4. c6t600A0B800049F93C031C48B3EC76d0 SUN-LCSM100_F-0670-931.01MB /scsi_vhci/[EMAIL PROTECTED] 8. c6t600A0B800049F93C031048B3EB44d0 SUN-LCSM100_F-0670-931.01MB /scsi_vhci/[EMAIL PROTECTED] Look at the label: SUN-LCSM100_F-0670-931.01MB The last field. Please educate me!! grin No problem. Things like this have happened to me from time to time. -Kyle Thanks again! --Kenny -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best layout for 15 disks?
mike wrote: Sorry :) Okay, so you can create a zpool from multiple vdevs. But you cannot add more vdevs to a zpool once the zpool is created. Is that right? Nope. That's exactly what you *CAN* do. So say today you only really need 6TB usable, you could go buy 8 of your 1TB disks, and setup a pool with a single 7 disk RAIDZ1 vDev, and a single spare today. Later when disks are cheaper, and you need the space you could add a second 7 disk RAIDZ1 to the pool. This way you'd gradually grow into exaclty the example you gave earlier. Also while it makes sense to use the same size drives in the same vDev, additional vDevs you add later can easily be made from different size drives. For the exaple above, when you got around to adding the second vDev, 2TB disks might be out, for the same space, you could create a vDev with fewer 2TB drives, or a vDev with the same number of drives and add twice the space, or some combo inbetween - Just because oyur first vDev had 7 disks doesn't mean the others have to. Antoher note, as someone said earlier, if you can go to 16 drives, you should consider 2 8disk RAIDZ2 vDevs, over 2 7disk RAIDZ vDevs with a spare, or (I would think) even a 14disk RAIDZ2 vDev with a spare. If you can (now or later) get room to have 17 drives, 2 8disk RAIDZ2 vDevs with a spare would be your best bet. And remember you can grow into it... 1 vDev and spare now, second vDev later. -Kyle That's what it sounded like someone said earlier. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best layout for 15 disks?
mike wrote: Or do smaller groupings of raidz1's (like 3 disks) so I can remove them and put 1.5TB disks in when they come out for instance? I wouldn't reduce it to 3 disks (should almost mirror if you go that low.) Remember, while you can't take a drive out of a vDev, or a vDev out of a pool, you can *replace* the drives in a vDev. For example if you have 8 1TB drives in a RAIDZ (1 or 2) vDev, and buy 8 1.5TB drives, instead of adding a second vDev which is always an option, you can replace 1 drive at a time, and as soon as the last drive in the vDev is swapped, you'll see the space in the pool jump. Granted, if you need to buy drives gradually, swapping out 3 at at time (with 3 drive vDevs) is easier than 8 at a time, but you'll lose 33% of your space to parity, instead of 25% and you'll only be able to lose one disk (of each set of 3) at a time. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss