Re: [zfs-discuss] S11 vs illumos zfs compatiblity
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 3 Jan 12, at 04:22 , Darren J Moffat wrote: On 12/28/11 06:27, Richard Elling wrote: On Dec 27, 2011, at 7:46 PM, Tim Cook wrote: On Tue, Dec 27, 2011 at 9:34 PM, Nico Williamsn...@cryptonector.com wrote: On Tue, Dec 27, 2011 at 8:44 PM, Frank Cusackfr...@linetwo.net wrote: So with a de facto fork (illumos) now in place, is it possible that two zpools will report the same version yet be incompatible across implementations? This was already broken by Sun/Oracle when the deduplication feature was not backported to Solaris 10. If you are running Solaris 10, then zpool version 29 features are not implemented. Solaris 10 does have some deduplication support, it can import and read datasets in a deduped pool just fine. You can't enable dedup on a dataset and any writes won't dedup they will rehydrate. So it is more like partial dedup support rather than it not being there at all. rehydrate??? Is it instant or freeze dried? Mike - --- Michael Sullivan m...@axsh.us http://www.axsh.us/ Phone: +1-662-259- Mobile: +1-662-202-7716 -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQIcBAEBAgAGBQJPAuOzAAoJEPFdIteZcPZgn7QQAI0nq500qymcpuTreoPpDHIL vvMtRS4/VoOxmHbu2wJT9GO21f4JC3CCzFRHl8t6NkAK5vi9cuNUx1IGjDjlZAqG Vp3H2DmtuHVHsPiAGB4J7b3zI4U8IL5tPhgbEcg5kkiTqBjMOCTdg1ibRz7ovf9Y aDmplOD1d2UN5il6FEs3ZEojHslb4yoRajd5HgyjibF6sdC1leKcAFaUOg9q0t/s 40Ckzw6G4RC5mCb6WHK+a4WXPUMG4uPryIRl4F4jxqrMCSw/rIUHa1slVcagu1gO wft+P7Y922SPnClMHhDufIGGKrqvJaOriYU+1ZXVoil18BaauboVn1/PEtlDOF57 vy0jOiC/DVICvk/AzzKfQxlO9YFhu4RInc27B2Ut4pCmXLeDDJpy5QXge+AZBM6K Q2dPJJ3ZNii4JYsTfIufMzWjBwBMhUgkbbK5kbdNyuIptg/ueHOKOf+v9gSkqCGC CjWrqtchtBSHa5Vw1JjMbKR5Y2qNzH+YuYICFgnYvJbZ31WO8TdzRL+M8PnuJRE3 WJDKs0TmSStYiuGZ1jf1oA3SJ1gcok47rYueSGKcmMSfhHfw3zeB0JpHLVQaCG2j k2CwfwGskSs1FvgHR+YbCCne5KXwk5PzqCvd5IGH7GZyEOJLtW29MjW5d2TazSzr 3u01cKzStpyXPaxj6+cD =SLu1 -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] commercial zfs-based storage replication software?
Maybe I'm missing something here, but Amanda has a whole bunch of bells and whistles, and scans the filesystem to determine what should be backed up. Way overkill for this task I think. Seems to me like zfs send blah | ssh replicatehost zfs receive … more than meets the requirement when combined with just plain old crontab. If it's a graphical interface you're looking for, I'm sure someone has hacked together somethings in TCL/Tk pr Perl/TK as an interface to cron which you could probably hack to have construct your particular crontab entry. Just a thought, Mike --- Michael Sullivan m...@axsh.us http://www.axsh.us/ Phone: +1-662-259- Mobile: +1-662-202-7716 On 30 Sep 11, at 07:33 , Fajar A. Nugraha wrote: On Fri, Sep 30, 2011 at 7:22 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha Does anyone know a good commercial zfs-based storage replication software that runs on Solaris (i.e. not an appliance, not another OS based on solaris)? Kinda like Amanda, but for replication (not backup). Please define replication, not backup? To me, your question is unclear what you want to accomplish. What don't you like about zfs send | zfs receive? Basically I need something that does zfs send | zfs receive, plus GUI/web interface to configure stuff (e.g. which fs to backup, schedule, etc.), support, and a price tag. Believe it or not the last two requirement are actually important (don't ask :P ), and are the main reasons why I can't use automated send - receive scripts already available from the internet. CMIIW, Amanda can use zfs send but it only store the resulting stream somewhere, while the requirement for this one is that the send stream must be received on a different server (e.g. DR site) and be accessible there. -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Mike --- Michael Sullivan m...@axsh.us http://www.axsh.us/ Phone: +1-662-259- Mobile: +1-662-202-7716 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disable ZIL - persistent
On 5 Aug 11, at 08:14 , Darren J Moffat wrote: On 08/05/11 13:11, Edward Ned Harvey wrote: My question: Is there any way to make Disabled ZIL a normal mode of operations in solaris 10? Particularly: If I do this echo zil_disable/W0t1 | mdb -kw then I have to remount the filesystem. It's kind of difficult to do this automatically at boot time, and impossible (as far as I know) for rpool. The only solution I see is to write some startup script which applies it to filesystems other than rpool. Which feels kludgy. Is there a better way? echo set zfs:zil_disable = 1 /etc/system echo set zfs:zil_disable = 1 /etc/system Mike --- Michael Sullivan m...@axsh.us http://www.axsh.us/ Phone: +1-662-259- Mobile: +1-662-202-7716 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] question about COW and snapshots
On 17 Jun 11, at 21:02 , Ross Walker wrote: On Jun 16, 2011, at 7:23 PM, Erik Trimble erik.trim...@oracle.com wrote: On 6/16/2011 1:32 PM, Paul Kraus wrote: On Thu, Jun 16, 2011 at 4:20 PM, Richard Elling richard.ell...@gmail.com wrote: You can run OpenVMS :-) Since *you* brought it up (I was not going to :-), how does VMS' versioning FS handle those issues ? It doesn't, per se. VMS's filesystem has a versioning concept (i.e. every time you do a close() on a file, it creates a new file with the version number appended, e.g. foo;1 and foo;2 are the same file, different versions). However, it is completely missing the rest of the features we're talking about, like data *consistency* in that file. It's still up to the app using the file to figure out what data consistency means, and such. Really, all VMS adds is versioning, nothing else (no API, no additional features, etc.). I believe NTFS was built on the same concept of file streams the VMS FS used for versioning. It's a very simple versioning system. Personnally I use Sharepoint, but there are other content management systems out there that provide what your looking for, so no need to bring out the crypt keeper. I think from following this whole discussion people are wanting Versions which will be offered by OS X Lion soon. However, it is dependent upon applications playing nice,behaving and using the standard API's. It would likely take a major overhaul in the way ZFS handles snapshots to create them at the object level rather than the filesystems level. Might be a nice exploratory exercise for those in the know with the ZFS roadmap, but then there are two roadmaps right? Also consistency and integrity cannot be guaranteed on the object level since an application may have more than a single filesystem object in use at a time and operations would need to be transaction based with commits and rollbacks. Way off-topic, but Smalltalk and its variants do this by maintaining the state of everything in an operating environment image. But then again, I could be wrong. Mike --- Michael Sullivan m...@axsh.us http://www.axsh.us/ Phone: +1-662-259- Mobile: +1-662-202-7716 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Resizing ZFS partition, shrinking NTFS?
On 17 Jun 11, at 21:14 , Bob Friesenhahn wrote: On Fri, 17 Jun 2011, Jim Klimov wrote: I gather that he is trying to expand his root pool, and you can not add a vdev to one. Though, true, it might be possible to create a second, data pool, in the partition. I am not sure if zfs can make two pools in different partitions of the same device though - underneath it still uses Solaris slices, and I think those can be used on one partition. That was my assumption for a long time, though never really tested. This would be a bad assumption. Zfs should not care and you are able to do apparently silly things with it. Sometimes allowing potentially silly things is quite useful. This is true. If one has mirrored disks, you could do something like I explain here WRT partitioning and resizing pools. http://www.kamiogi.net/Kamiogi/Frame_Dragging/Entries/2009/5/19_Everything_in_Its_Place_-_Moving_and_Reorganizing_ZFS_Storage.html I did some shuffling using Solaris partitions here on a home server, but it was using mirrors of the same geometry disks. You might be able to o a similar shuffle using an external USB drive which was appropriately sized and turn on autoexpand. Mike --- Michael Sullivan m...@axsh.us http://www.axsh.us/ Phone: +1-662-259- Mobile: +1-662-202-7716 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A few questions
Just to add a bit to this, I just love sweeping generalizations... On 9 Jan 2011, at 19:33 , Richard Elling wrote: On Jan 9, 2011, at 4:19 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: Pasi Kärkkäinen [mailto:pa...@iki.fi] Other OS's have had problems with the Broadcom NICs aswell.. Yes. The difference is, when I go to support.dell.com and punch in my service tag, I can download updated firmware and drivers for RHEL that (at least supposedly) solve the problem. I haven't tested it, but the dell support guy told me it has worked for RHEL users. There is nothing available to download for solaris. The drivers are written by Broadcom and are, AFAIK, closed source. By going through Dell, you are going through a middle-man. For example, http://www.broadcom.com/support/ethernet_nic/netxtremeii10.php where you see the release of the Solaris drivers was at the same time as Windows. What Richard says is true. Broadcom have been a source of contention in the Linux world as well as the *BSD world due to the proprietary nature of their firmware. OpenSolaris/Solaris users are not the only ones who have complained about this. There's been much uproar in the FOSS community about Broadcom and their drivers. As a result, I've seen some pretty nasty hacks like people using the Windows drivers linked into their kernel - *gack* I forget all the gory details, but it was rather disgusting as I recall, bubblegum, bailing wire, duct tape and all. Dell and Red Hat aren't exactly a marriage made in heaven either. I've had problems getting support from both Dell and Red Hat, them pointing fingers at each other rather than solving the problem. Like most people, I've had to come up with my own work-arounds, like others with the Broadcom issue, using a known quantity NIC. When dealing with Dell as a corporate buyer, they have always made it quite clear that they are primarily a Windows platform. Linux, oh yes, we have that too... Also, the bcom is not the only problem on that server. After I added-on an intel network card and disabled the bcom, the weekly crashes stopped, but now it's ... I don't know ... once every 3 weeks with a slightly different mode of failure. This is yet again, rare enough that the system could very well pass a certification test, but not rare enough for me to feel comfortable putting into production as a primary mission critical server. I've never been particularly warm and fuzzy with Dell servers. They seem to like to change their chipsets slightly while a model is in production. This can cause all sorts of problems which are difficult to diagnose since an identical Dell system will have no problems, and it's mate will crash weekly. I really think there are only two ways in the world to engineer a good solid server: (a) Smoke your own crack. Systems engineering teams use the same systems that are sold to customers. This is rarely practical, not to mention that product development is often not in the systems engineering organization. or (b) Sell millions of 'em. So despite whether or not the engineering team uses them, you're still going to have sufficient mass to dedicate engineers to the purpose of post-sales bug solving. yes, indeed :-) -- richard As for certified systems, It's my understanding that Nexenta themselves don't certify anything. They have systems which are recommended and supported by their network of VAR's. It just so happens that SuperMicro is one of the brands of choice, but even then one must adhere to a fairly tight HCL. The same holds true for Solaris/OpenSolaris with third-party hardware. SATA Controllers and multiplexers are also another example of the drivers being written by the manufacturer and Solaris/OpenSolaris are not a priority over Windows and Linux, in that order. Deviation from items which are not somewhat plain vanilla and are not listed on the HCL is just asking for trouble. Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Mobile: +1-662-202-7716 US Phone: +1-561-283-2034 JP Phone: +81-50-5806-6242 smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Running on Dell hardware?
Congratulations Ed, and welcome to open systems… Ah, but Nexenta is open and has no vendor lock-in. That's what you probably should have done is bank everything on Illumos and Nexenta. A winning combination by all accounts. But then again, you could have used Linux on any hardware as well. Then your hardware and software issues would probably be multiplied even more. Cheers, Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 On 23 Oct 2010, at 12:53 , Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Kyle McDonald I'm currently considering purchasing 1 or 2 Dell R515's. With up to 14 drives, and up to 64GB of RAM, it seems like it's well suited for a low-end ZFS server. I know this box is new, but I wonder if anyone out there has any experience with it? How about the H700 SAS controller? Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I want to put some SSD's in a box like this, but there's no way I'm going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are they kidding? You are asking for a world of hurt. You may luck out, and it may work great, thus saving you money. Take my example for example ... I took the safe approach (as far as any non-sun hardware is concerned.) I bought an officially supported dell server, with all dell blessed and solaris supported components, with support contracts on both the hardware and software, fully patched and updated on all fronts, and I am getting system failures approx once per week. I have support tickets open with both dell and oracle right now ... Have no idea how it's all going to turn out. But if you have a problem like mine, using unsupported hardware, you have no alternative. You're up a tree full of bees, naked, with a hunter on the ground trying to shoot you. And IMHO, I think the probability of having a problem like mine is higher when you use the unsupported hardware. But of course there's no definable way to quantize that belief. My advice to you is: buy the supported hardware, and the support contracts for both the hardware and software. But of course, that's all just a calculated risk, and I doubt you're going to take my advice. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
Everyone, Thanks for the help. I really appreciate it. Well, I actually walked through the source code with an associate today and we found out how things work by looking at the code. It appears that L2ARC is just assigned in round-robin fashion. If a device goes offline, then it goes to the next and marks that one as offline. The failure to retrieve the requested object is treated like a cache miss and everything goes along its merry way, as far as we can tell. I would have hoped it to be different in some way. Like if the L2ARC was striped for performance reasons, that would be really cool and using that device as an extension of the VM model it is modeled after. Which would mean using the L2ARC as an extension of the virtual address space and striping it to make it more efficient. Way cool. If it took out the bad device and reconfigured the stripe device, that would be even way cooler. Replacing it with a hot spare more cool too. However, it appears from the source code that the L2ARC is just a (sort of) jumbled collection of ZFS objects. Yes, it gives you better performance if you have it, but it doesn't really use it in a way you might expect something as cool as ZFS does. I understand why it is read only, and it invalidates it's cache when a write occurs, to be expected for any object written. If an object is not there because of a failure or because it has been removed from the cache, it is treated as a cache miss, all well and good - go fetch from the pool. I also understand why the ZIL is important and that it should be mirrored if it is to be on a separate device. Though I'm wondering how it is handled internally when there is a failure of one of it's default devices, but then again, it's on a regular pool and should be redundant enough, only just some degradation in speed. Breaking these devices out from their default locations is great for performance, and I understand. I just wish the knowledge of how they work and their internal mechanisms were not so much of a black box. Maybe that is due to the speed at which ZFS is progressing and the features it adds with each subsequent release. Overall, I am very impressed with ZFS, its flexibility and even more so, it's breaking all the rules about how storage should be managed and I really like it. I have yet to see anything to come close in its approach to disk data management. Let's just hope it keeps moving forward, it is truly a unique way to view disk storage. Anyway, sorry for the ramble, but to everyone, thanks again for the answers. Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 On 7 May 2010, at 00:00 , Robert Milkowski wrote: On 06/05/2010 15:31, Tomas Ögren wrote: On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes: On Wed, 5 May 2010, Edward Ned Harvey wrote: In the L2ARC (cache) there is no ability to mirror, because cache device removal has always been supported. You can't mirror a cache device, because you don't need it. How do you know that I don't need it? The ability seems useful to me. The gain is quite minimal.. If the first device fails (which doesn't happen too often I hope), then it will be read from the normal pool once and then stored in ARC/L2ARC again. It just behaves like a cache miss for that specific block... If this happens often enough to become a performance problem, then you should throw away that L2ARC device because it's broken beyond usability. Well if a L2ARC device fails there might be an unacceptable drop in delivered performance. If it were mirrored than a drop usually would be much smaller or there could be no drop if a mirror had an option to read only from one side. Being able to mirror L2ARC might especially be useful once a persistent L2ARC is implemented as after a node restart or a resource failover in a cluster L2ARC will be kept warm. Then the only thing which might affect L2 performance considerably would be a L2ARC device failure... -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
Hi Marc, Well, if you are striping over multiple devices the you I/O should be spread over the devices and you should be reading them all simultaneously rather than just accessing a single device. Traditional striping would give 1/n performance improvement rather than 1/1 where n is the number of disks the stripe is spread across. The round-robin access I am referring to, is the way the L2ARC vdevs appear to be accessed. So, any given object will be taken from a single device rather than from several devices simultaneously, thereby increasing the I/O throughput. So, theoretically, a stripe spread over 4 disks would give 4 times the performance as opposed to reading from a single disk. This also assumes the controller can handle multiple I/O as well or that you are striped over different disk controllers for each disk in the stripe. SSD's are fast, but if I can read a block from more devices simultaneously, it will cut the latency of the overall read. On 7 May 2010, at 02:57 , Marc Nicholas wrote: Hi Michael, What makes you think striping the SSDs would be faster than round-robin? -marc On Thu, May 6, 2010 at 1:09 PM, Michael Sullivan michael.p.sulli...@mac.com wrote: Everyone, Thanks for the help. I really appreciate it. Well, I actually walked through the source code with an associate today and we found out how things work by looking at the code. It appears that L2ARC is just assigned in round-robin fashion. If a device goes offline, then it goes to the next and marks that one as offline. The failure to retrieve the requested object is treated like a cache miss and everything goes along its merry way, as far as we can tell. I would have hoped it to be different in some way. Like if the L2ARC was striped for performance reasons, that would be really cool and using that device as an extension of the VM model it is modeled after. Which would mean using the L2ARC as an extension of the virtual address space and striping it to make it more efficient. Way cool. If it took out the bad device and reconfigured the stripe device, that would be even way cooler. Replacing it with a hot spare more cool too. However, it appears from the source code that the L2ARC is just a (sort of) jumbled collection of ZFS objects. Yes, it gives you better performance if you have it, but it doesn't really use it in a way you might expect something as cool as ZFS does. I understand why it is read only, and it invalidates it's cache when a write occurs, to be expected for any object written. If an object is not there because of a failure or because it has been removed from the cache, it is treated as a cache miss, all well and good - go fetch from the pool. I also understand why the ZIL is important and that it should be mirrored if it is to be on a separate device. Though I'm wondering how it is handled internally when there is a failure of one of it's default devices, but then again, it's on a regular pool and should be redundant enough, only just some degradation in speed. Breaking these devices out from their default locations is great for performance, and I understand. I just wish the knowledge of how they work and their internal mechanisms were not so much of a black box. Maybe that is due to the speed at which ZFS is progressing and the features it adds with each subsequent release. Overall, I am very impressed with ZFS, its flexibility and even more so, it's breaking all the rules about how storage should be managed and I really like it. I have yet to see anything to come close in its approach to disk data management. Let's just hope it keeps moving forward, it is truly a unique way to view disk storage. Anyway, sorry for the ramble, but to everyone, thanks again for the answers. Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 On 7 May 2010, at 00:00 , Robert Milkowski wrote: On 06/05/2010 15:31, Tomas Ögren wrote: On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes: On Wed, 5 May 2010, Edward Ned Harvey wrote: In the L2ARC (cache) there is no ability to mirror, because cache device removal has always been supported. You can't mirror a cache device, because you don't need it. How do you know that I don't need it? The ability seems useful to me. The gain is quite minimal.. If the first device fails (which doesn't happen too often I hope), then it will be read from the normal pool once and then stored in ARC/L2ARC again. It just behaves like a cache miss for that specific block... If this happens often enough to become a performance problem, then you should throw away that L2ARC device because it's broken beyond usability. Well if a L2ARC device fails there might be an unacceptable drop in delivered performance. If it were mirrored than
Re: [zfs-discuss] why both dedup and compression?
This is interesting, but what about iSCSI volumes for virtual machines? Compress or de-dupe? Assuming the virtual machine was made from a clone of the original iSCSI or a master iSCSI volume. Does anyone have any real world data this? I would think the iSCSI volumes would diverge quite a bit over time even with compression and/or de-duplication. Just curious… On 6 May 2010, at 16:39 , Peter Tribble wrote: On Thu, May 6, 2010 at 2:06 AM, Richard Jahnel rich...@ellipseinc.com wrote: I've googled this for a bit, but can't seem to find the answer. What does compression bring to the party that dedupe doesn't cover already? Compression will reduce the storage requirements for non-duplicate data. As an example, I have a system that I rsync the web application data from a whole bunch of servers (zones) to. There's a fair amount of duplication in the application files (java, tomcat, apache, and the like) so dedup is a big win. On the other hand, there's essentially no duplication whatsoever in the log files, which are pretty big, but compress really well. So having both enabled works really well. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
Hi Ed, Thanks for your answers. Seem to make sense, sort of… On 6 May 2010, at 12:21 , Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Michael Sullivan I have a question I cannot seem to find an answer to. Google for ZFS Best Practices Guide (on solarisinternals). I know this answer is there. My Google is very strong and I have the Best Practices Guide committed to bookmark as well as most of it to memory. While it explains how to implement these, there is no information regarding failure of a device in a striped L2ARC set of SSD's. I have been hard pressed to find this information anywhere, short of testing it myself, but I don't have the necessary hardware in a lab to test correctly. If someone has pointers to references, could you please provide them to chapter and verse, rather than the advice to Go read the manual. I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will be relocated back to the spool. I'd probably have it mirrored anyway, just in case. However you cannot mirror the L2ARC, so... Careful. The log device removal feature exists, and is present in the developer builds of opensolaris today. However, it's not included in opensolars 2009.06, and it's not included in the latest and greatest solaris 10 yet. Which means, right now, if you lose an unmirrored ZIL (log) device, your whole pool is lost, unless you're running a developer build of opensolaris. I'm running 2009.11 which is the latest OpenSolaris. I should have made that clear, and that I don't intend this to be on Solaris 10 system, and am waiting for the next production build anyway. As you say, it does not exist in 2009.06, this is not the latest production Opensolaris which is 2009.11, and I'd be more interested in its behavior than an older release. I am also well aware of the effect of losing a ZIL device will cause loss of the entire pool. Which is why I would never have a ZIL device unless it was mirrored and on different controllers. From the information I've been reading about the loss of a ZIL device, it will be relocated to the storage pool it is assigned to. I'm not sure which version this is in, but it would be nice if someone could provide the release number it is included in (and actually works), it would be nice. Also, will this functionality be included in the mythical 2010.03 release? Also, I'd be interested to know what features along these lines will be available in 2010.03 if it ever sees the light of day. What I want to know, is what happens if one of those SSD's goes bad? What happens to the L2ARC? Is it just taken offline, or will it continue to perform even with one drive missing? In the L2ARC (cache) there is no ability to mirror, because cache device removal has always been supported. You can't mirror a cache device, because you don't need it. If one of the cache devices fails, no harm is done. That device goes offline. The rest stay online. So what you are saying is that if a single device fails in a striped L2ARC VDEV, then the entire VDEV is taken offline and the fallback is to simply use the regular ARC and fetch from the pool whenever there is a cache miss. Or, does what you are saying here mean that if I have a 4 SSD's in a stripe for my L2ARC, and one device fails, the L2ARC will be reconfigured dynamically using the remaining SSD's for L2ARC. It would be good to get an answer to this from someone who has actually tested this or is more intimately familiar with the ZFS code rather than all the speculation I've been getting so far. Sorry, if these questions have been asked before, but I cannot seem to find an answer. Since you said this twice, I'll answer it twice. ;-) I think the best advice regarding cache/log device mirroring is in the ZFS Best Practices Guide. Been there read that, many, many times. It's an invaluable reference, I agree. Thanks Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
On 6 May 2010, at 13:18 , Edward Ned Harvey wrote: From: Michael Sullivan [mailto:michael.p.sulli...@mac.com] While it explains how to implement these, there is no information regarding failure of a device in a striped L2ARC set of SSD's. I have http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Sepa rate_Cache_Devices It is not possible to mirror or use raidz on cache devices, nor is it necessary. If a cache device fails, the data will simply be read from the main pool storage devices instead. I understand this. I guess I didn't write this part, but: If you have multiple cache devices, they are all independent from each other. Failure of one does not negate the functionality of the others. Ok, this is what I wanted to know. The that the L2ARC devices assigned to the pool are not striped but are independent. Loss of one drive will just cause a cache miss and force ZFS to go out to the pool for its objects. But then I'm not talking about using RAIDZ on a cache device. I'm talking about a striped device which would be RAID-0. If the SSD's are all assigned to L2ARC, then they are not striped in any fashion (RAID-0), but are completely independent and the L2ARC will continue to operate, just missing a single SSD. I'm running 2009.11 which is the latest OpenSolaris. Quoi?? 2009.06 is the latest available from opensolaris.com and opensolaris.org. If you want something newer, AFAIK, you have to go to developer build, such as osol-dev-134 Sure you didn't accidentally get 2008.11? My mistake… snv_111b which is 2009.06. I know it went up to 11 somewhere. I am also well aware of the effect of losing a ZIL device will cause loss of the entire pool. Which is why I would never have a ZIL device unless it was mirrored and on different controllers. Um ... the log device is not special. If you lose *any* unmirrored device, you lose the pool. Except for cache devices, or log devices on zpool =19 Well, if I've got a separate ZIL which is mirrored for performance, and mirrored because I think my data is valuable and important, I will have something more than RAID-0 on my main storage pool too. More than likely RAIDZ2 since I plan on using L2ARC to help improve performance along with separate SSD mirrored ZIL devices. From the information I've been reading about the loss of a ZIL device, it will be relocated to the storage pool it is assigned to. I'm not sure which version this is in, but it would be nice if someone could provide the release number it is included in (and actually works), it would be nice. What the heck? Didn't I just answer that question? I know I said this is answered in ZFS Best Practices Guide. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Sepa rate_Log_Devices Prior to pool version 19, if you have an unmirrored log device that fails, your whole pool is permanently lost. Prior to pool version 19, mirroring the log device is highly recommended. In pool version 19 or greater, if an unmirrored log device fails during operation, the system reverts to the default behavior, using blocks from the main storage pool for the ZIL, just as if the log device had been gracefully removed via the zpool remove command. No need to get defensive here, all I'm looking for is the spool version number which supports it and the version of OpenSolaris which supports that ZPOOL version. I think that if you are building for performance, it would be almost intuitive to have a mirrored ZIL in the event of failure, and perhaps even a hot spare available as well. I don't like the idea of my ZIL being transferred back to the pool, but having it transferred back is better than the alternative which would be data loss or corruption. Also, will this functionality be included in the mythical 2010.03 release? Zpool 19 was released in build 125. Oct 16, 2009. You can rest assured it will be included in 2010.03, or 04, or whenever that thing comes out. Thanks, build 125. So what you are saying is that if a single device fails in a striped L2ARC VDEV, then the entire VDEV is taken offline and the fallback is to simply use the regular ARC and fetch from the pool whenever there is a cache miss. It sounds like you're only going to believe it if you test it. Go for it. That's what I did before I wrote that section of the ZFS Best Practices Guide. In ZFS, there is no such thing as striping, although the term is commonly used, because adding multiple devices creates all the benefit of striping, plus all the benefit of concatenation, but colloquially, people think concatenation is weird or unused or something, so people just naturally gravitated to calling it a stripe in ZFS too, although that's not technically correct according to the traditional RAID definition. But nobody bothered to create a new term stripecat or whatever, for ZFS. Ummm, yes
[zfs-discuss] Loss of L2ARC SSD Behaviour
HI, I have a question I cannot seem to find an answer to. I know I can set up a stripe of L2ARC SSD's with say, 4 SSD's. I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will be relocated back to the spool. I'd probably have it mirrored anyway, just in case. However you cannot mirror the L2ARC, so... What I want to know, is what happens if one of those SSD's goes bad? What happens to the L2ARC? Is it just taken offline, or will it continue to perform even with one drive missing? Sorry, if these questions have been asked before, but I cannot seem to find an answer. Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
Ok, thanks. So, if I understand correctly, it will just remove the device from the VDEV and continue to use the good ones in the stripe. Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 On 5 May 2010, at 04:34 , Marc Nicholas wrote: The L2ARC will continue to function. -marc On 5/4/10, Michael Sullivan michael.p.sulli...@mac.com wrote: HI, I have a question I cannot seem to find an answer to. I know I can set up a stripe of L2ARC SSD's with say, 4 SSD's. I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will be relocated back to the spool. I'd probably have it mirrored anyway, just in case. However you cannot mirror the L2ARC, so... What I want to know, is what happens if one of those SSD's goes bad? What happens to the L2ARC? Is it just taken offline, or will it continue to perform even with one drive missing? Sorry, if these questions have been asked before, but I cannot seem to find an answer. Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sent from my mobile device ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle to no longer support ZFS on OpenSolaris?
Bogdan, Thanks for pointing this out and passing along the latest news from Oracle. Stamp out FUD wherever possible. At this point, unless it is said officially, and Oracle generally keeps pretty tight lipped about products and directions, people should regard most things as heresy. Cheers, Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 On 23 Apr 2010, at 10:22 , BM wrote: On Tue, Apr 20, 2010 at 2:18 PM, Ken Gunderson kgund...@teamcool.net wrote: Greetings All: Granted there has been much fear, uncertainty, and doubt following Oracle's take over of Sun, but I ran across this on a FreeBSD mailing list post dated 4/20/2010 ...Seems that Oracle won't offer support for ZFS on opensolaris Link here to full post here: http://lists.freebsd.org/pipermail/freebsd-questions/2010-April/215269.html I am not surprised it comes from FreeBSD mail list. :) I am amazed of their BSD conferences when they presenting all this *BSD stuff using Apple Macs (they claim it is a FreeBSD, just very bad version of it), Ubuntu Linux (not yet BSD) or GNU/Microsoft Windows (oh, everybody does that sin, right?) with a PowerPoint running on it (sure, who wants ugly OpenOffice if there no brain enough to use LaTeX). As for a starter, please somebody read this: http://developers.sun.ru/techdays2010/reports/OracleSolarisTrack/TD_STP_OracleSolarisFuture_Roberts.pdf ...and thus better I suggest to refrain people broadcasting a complete garbage from a trash dump places to spread this kind of FUD to the public and thus just shaking an air with no meaning behind. Take care. -- Kind regards, BM Things, that are stupid at the beginning, rarely ends up wisely. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recursive snaptshot
On 23 Jun 2009, at 23:59 , Darren J Moffat wrote: Harry Putnam wrote: I thought I recalled reading somewhere that in the situation where you have several zfs filesystems under one top level directory like this: rpool rpool/ROOT/osol-112 rpool/export rpool/export/home rpool/export/home/reader you could do a shapshot encompassing everything below zpool instead of having to do it at each level. (Maybe it was in a dream...) zfs snapshot -r rp...@mysnapshotnamegoeshere Personally, I'd like to see it have a property to exclude filesystems like: zfs set nosnap=yes rpool/swap zfs set nosnap=yes rpool/dump In order to exclude file systems I don't really care about snapshots. That would be handier than having to remove them every time. Cheers, Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recursive snaptshot
Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 On 24 Jun 2009, at 01:01 , Harry Putnam wrote: Darren J Moffat darr...@opensolaris.org writes: Harry Putnam wrote: I thought I recalled reading somewhere that in the situation where you have several zfs filesystems under one top level directory like this: rpool rpool/ROOT/osol-112 rpool/export rpool/export/home rpool/export/home/reader you could do a shapshot encompassing everything below zpool instead of having to do it at each level. (Maybe it was in a dream...) zfs snapshot -r rp...@mysnapshotnamegoeshere Well no, I posted the question because that doesn't do it. zfs list -r rpool [...] rpool/dump1.50G 292G 1.50G - rpool/export 15.9G 292G21K /export rpool/export/home 15.9G 292G22K /export/home rpool/export/home/reader 15.9G 292G 11.5G /export/home/reader [...] # zfs snapshot -r rp...@somedate # zfs snapthot rpool/someotherdate cd /rpool/.zfs/snapshot diff -r somedate someotherdate No difference, and there is no rpool/dump rpool/export rpool/export/home rpool/export/home/reader under either snapshot... not to mention all the other stuff shown with zfs list -r rpool that I snipped. Try: zfs list -t all or zfs list -t snapshot You could also set the property for snapshots to be listed in zfs list. The default is off. zpool set listsnaps=on rpool Cheers, Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] two pools on boot disk?
Fajar, Yes, you could probably do send/receive from one pool to another, but that would be somewhat more time consuming and you'd have to make sure everything was right in your GRUB menu.lst as well as boot blocks, not to mention the potential for namespace collisions when dealing with a root pool. But this is missing my point. The thing I found more interesting was that a pool could be increased in space by doing a zpool replace with a larger disk. This means if say, you have a pool of 100GB disks and you want to increase the size, you can replace them with bigger disks effectively growing the pool. Not sure how this works out with configurations other than in RAID 0 and RAID 1, but I thought it was a pretty nice feature knowing I can put bigger disks in really easily. I also agree the installer should have an expert mode for configuring disks. The all-or-nothing approach is easy for people who have never been exposed to Solaris or OpenSolaris, but leaves people out in the cold if you wish to have different configuration for your disks. The Automated Installer, is supposed to give this sort of flexibility, but I haven't tried it out yet. Regards, Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 On 22 Jun 2009, at 11:00 , Fajar A. Nugraha wrote: On Sat, Jun 20, 2009 at 7:02 PM, Michael Sullivanmichael.p.sulli...@mac.com wrote: One really interesting bit is how easily it is to make the disk in a pool bigger by doing a zpool replace on the device. It couldn't have been any easier with ZFS. It's interesting how you achieved that, although it'd be much easier if the installer supports that from the GUI instead of having to use zpool replace as a workaround. I believe using export-import as described in http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#ZFS_Root_Pool_Recovery should also work. -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] two pools on boot disk?
Hi Charles, Works fine. I did just that with my home system. I have 2x .5 TB disks which I didn't want to dedicate to rpool, and I wanted to create a second pool on those disks which could be expanded. I set up the rpool to be 100GB and that left me with a 400GB partition to make into an extended pool (xpool). There are probably some down-sides to doing this, but I have yet to come across them at this point. The reason I did this is to get around the limitation on rpool which restricts it to being simple mirrors which cannot be added to in a striped configuration. After that was set up I attached 2x 1 TB disks to the extended pool in a mirrored configuration. Check out my blog entry which explains exactly how to do this. The system I used in the demo is inside VirtualBox, but I have real hardware running in the configuration I mention. Using VirtualBox, I worked out the finer bits, before trying it out on my live machine. http://www.kamiogi.net/Kamiogi/Frame_Dragging/Entries/2009/5/10_OpenSolaris_Disk_Partitioning_and_the_Free_Hog.html One really interesting bit is how easily it is to make the disk in a pool bigger by doing a zpool replace on the device. It couldn't have been any easier with ZFS. I've even done a fresh install on this configuration just recently, and other than being exposed for a bit while I broke the mirrors to install a fresh copy of the OS, everything worked out alright. A few snags with namespace collisions when I re-imported the original rpool, but I'd already seen those before and wrote about them in another blog entry. If you have any questions, feel free to let me know. Cheers, Mike Mike --- Michael Sullivan michael.p.sulli...@me.com Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 On 20 Jun 2009, at 20:44 , Charles Hedrick wrote: I have a small system that is going to be a file server. It has two disks. I'd like just one pool for data. Is it possible to create two pools on the boot disk, and then add the second disk to the second pool? The result would be a single small pool for root, and a second pool containing the rest of that disk plus the second disk? The installer seems to want to use the whole disk for the root pool. Is there a way to change that? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss