Re: [zfs-discuss] Best practice for boot partition layout in ZFS
On 4/6/2011 11:08 AM, Erik Trimble wrote: Traditionally, the reason for a separate /var was one of two major items: (a) /var was writable, and / wasn't - this was typical of diskless or minimal local-disk configurations. Modern packaging systems are making this kind of configuration increasingly difficult. (b) /var held a substantial amount of data, which needed to be handled separately from / - mail and news servers are a classic example For typical machines nowdays, with large root disks, there is very little chance of /var suddenly exploding and filling / (the classic example of being screwed... wink). Outside of the above two cases, about the only other place I can see that having /var separate is a good idea is for certain test machines, where you expect frequent memory dumps (in /var/crash) - if you have a large amount of RAM, you'll need a lot of disk space, so it might be good to limit /var in this case by making it a separate dataset. Some more info ala (b) - The something filled up the root fs and the box crashed problem was fixed for awhile ago. It's still a drag cleaning up an errant process that is filling up a file system but it shouldn't crash/panic anymore. However, old habits die hard, especially at government sites where the rules require a papal bull to be changed, so I think the option was left to keep folks happy more than any practical reason. I'm sure someone has a really good reason to keep /var separated but those cases are fewer and far between than I saw 10 years ago. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
On 2/25/2011 4:15 PM, Torrey McMahon wrote: On 2/25/2011 3:49 PM, Tomas Ögren wrote: On 25 February, 2011 - David Blasingame Oracle sent me these 2,6K bytes: Hi All, In reading the ZFS Best practices, I'm curious if this statement is still true about 80% utilization. It happens at about 90% for me.. all of a sudden, the mail server got butt slow.. killed an old snapshot to get to 85% free or so, then it got snappy again. S10u9 sparc. Some of the recent updates have pushed the 80% watermark closer to 90% for most workloads. Sorry folks. I was thinking of yet an other change that was in the allocation algorithms. 80% is number to stick with. ... now where did I put my cold medicine? :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
On 2/25/2011 3:49 PM, Tomas Ögren wrote: On 25 February, 2011 - David Blasingame Oracle sent me these 2,6K bytes: Hi All, In reading the ZFS Best practices, I'm curious if this statement is still true about 80% utilization. It happens at about 90% for me.. all of a sudden, the mail server got butt slow.. killed an old snapshot to get to 85% free or so, then it got snappy again. S10u9 sparc. Some of the recent updates have pushed the 80% watermark closer to 90% for most workloads. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] multipath used inadvertantly?
in.mpathd is the IP multipath daemon. (Yes, it's a bit confusing that mpathadm is the storage multipath admin tool. ) If scsi_vhci is loaded in the kernel you have storage multipathing enabled. (Check with modinfo.) On 2/15/2011 3:53 PM, Ray Van Dolson wrote: I'm troubleshooting an existing Solaris 10U9 server (x86 whitebox) and noticed its device names are extremely hair -- very similar to the multipath device names: c0t5000C50026F8ACAAd0, etc, etc. mpathadm seems to confirm: # mpathadm list lu /dev/rdsk/c0t50015179591CE0C1d0s2 Total Path Count: 1 Operational Path Count: 1 # ps -ef | grep mpath root 245 1 0 Jan 05 ? 16:38 /usr/lib/inet/in.mpathd -a The system is SuperMicro based with an LSI SAS2008 controller in it. To my knowledge it has no multipath capabilities (or at least not as its wired up currently). The mpt_sas driver is in use per prtconf and modinfo. My questions are: - What scenario would the multipath driver get loaded up at installation time for this LSI controller? I'm guessing this is what happened? - If I disabled mpathd would I get the shorter disk device names back again? How would this impact existing zpools that are already on the system tied to these disks? I have a feeling doing this might be a little bit painful. :) I tried to glean the original device names from stmsboot -L, but it didn't show any mappings... Thanks, Ray ___ storage-discuss mailing list storage-disc...@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] One LUN per RAID group
On 2/14/2011 10:37 PM, Erik Trimble wrote: That said, given that SAN NVRAM caches are true write caches (and not a ZIL-like thing), it should be relatively simple to swamp one with write requests (most SANs have little more than 1GB of cache), at which point, the SAN will be blocking on flushing its cache to disk. Actually, most array controllers now have 10s if not 100s of GB of cache. The 6780 has 32GB, DMX-4 has - if I remember correctly - 256. The latest HDS box is probably close if not more. Of course you still have to flush to disk and the cache flush algorithms of the boxes themselves come into play but 1GB was a long time ago. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best choice - file system for system
On 1/30/2011 5:26 PM, Joerg Schilling wrote: Richard Ellingrichard.ell...@gmail.com wrote: ufsdump is the problem, not ufsrestore. If you ufsdump an active file system, there is no guarantee you can ufsrestore it. The only way to guarantee this is to keep the file system quiesced during the entire ufsdump. Needless to say, this renders ufsdump useless for backup when the file system also needs to accommodate writes. This is why there is a ufs snapshot utility. You'll have the same problem. fssnap_ufs(1M) write locks the file system when you run the lock command. See the notes section of the man page. http://download.oracle.com/docs/cd/E19253-01/816-5166/6mbb1kq1p/index.html#Notes ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] reliable, enterprise worthy JBODs?
On 1/25/2011 2:19 PM, Marion Hakanson wrote: The only special tuning I had to do was turn off round-robin load-balancing in the mpxio configuration. The Seagate drives were incredibly slow when running in round-robin mode, very speedy without. Interesting. Did you switch to the load-balance option? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How well does zfs mirror handle temporary disk offlines?
On 1/18/2011 2:46 PM, Philip Brown wrote: My specific question is, how easily does ZFS handle*temporary* SAN disconnects, to one side of the mirror? What if the outage is only 60 seconds? 3 minutes? 10 minutes? an hour? Depends on the multipath drivers and the failure mode. For example, if the link drops completely at the host hba connection some failover drivers will mark the path down immediately which will propagate up the stack faster than an intermittent connection or something father down stream failing. If we have 2x1TB drives, in a simple zfs mirror if one side goes temporarily off line, will zfs attempt to resync **1 TB** when it comes back? Or does it have enough intelligence to say, oh hey I know this disk..and I know [these bits] are still good, so I just need to resync [that bit] ? My understanding is yes though I can't find the reference for this. (I'm sure someone else will find it in short order.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Changing GUID
Are those really your requirements? What is it that you're trying to accomplish with the data? Make a copy and provide to an other host? On 11/15/2010 5:11 AM, sridhar surampudi wrote: Hi I am looking in similar lines, my requirement is 1. create a zpool on one or many devices ( LUNs ) from an array ( array can be IBM or HPEVA or EMC etc.. not SS7000). 2. Create file systems on zpool 3. Once file systems are in use (I/0 is happening) I need to take snapshot at array level a. Freeze the zfs flle system ( not required due to zfs consistency : source : mailing groups) b. take array snapshot ( say .. IBM flash copy ) c. Got new snapshot device (having same data and metadata including same GUID of source pool) Now I need a way to change the GUID and pool of snapshot device so that the snapshot device can be accessible on same host or an alternate host (if the LUN is shared). Could you please post commands for the same. Regards, sridhar. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS no longer working with FC devices.
On 5/23/2010 11:49 AM, Richard Elling wrote: FWIW, the A5100 went end-of-life (EOL) in 2001 and end-of-service-life (EOSL) in 2006. Personally, I hate them with a passion and would like to extend an offer to use my tractor to bury the beast:-). I'm sure I can get some others to help. Can I smash the gbics? Those were my favorite. :-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpxio load-balancing...it doesn't work??
Not true. There are different ways that a storage array, and it's controllers, connect to the host visible front end ports which might be confusing the author but i/o isn't duplicated as he suggests. On 4/4/2010 9:55 PM, Brad wrote: I had always thought that with mpxio, it load-balances IO request across your storage ports but this article http://christianbilien.wordpress.com/2007/03/23/storage-array-bottlenecks/ has got me thinking its not true. The available bandwidth is 2 or 4Gb/s (200 or 400MB/s – FC frames are 10 bytes long -) per port. As load balancing software (Powerpath, MPXIO, DMP, etc.) are most of the times used both for redundancy and load balancing, I/Os coming from a host can take advantage of an aggregated bandwidth of two ports. However, reads can use only one path, but writes are duplicated, i.e. a host write ends up as one write on each host port. Is this true? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpxio load-balancing...it doesn't work??
The author mentions multipathing software in the blog entry. Kind of hard to mix that up with cache mirroring if you ask me. On 4/5/2010 9:16 PM, Brad wrote: I'm wondering if the author is talking about cache mirroring where the cache is mirrored between both controllers. If that is the case, is he saying that for every write to the active controlle,r a second write issued on the passive controller to keep the cache mirrored? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] demise of community edition
This is a topic for indiana-discuss, not zfs-discuss. If you read through the archives of that alias you should see some pointers. On 1/31/2010 11:38 AM, Tom Bird wrote: Afternoon, I note to my dismay that I can't get the community edition any more past snv_129, this version was closest to the normal way of doing things that I am used to with Solaris = 10, the standard OpenSolaris releases seem only to have this horrible Gnome based installer that gives you only one option - install everything. Am I just doing it wrong or is there another way to get OpenSolaris installed in a sane manner other than just sticking with community edition at snv_129? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
On 1/8/2010 10:04 AM, James Carlson wrote: Mike Gerdts wrote: This unsupported feature is supported with the use of Sun Ops Center 2.5 when a zone is put on a NAS Storage Library. Ah, ok. I didn't know that. Does anyone know how that works? I can't find it in the docs, no one inside of Sun seemed to have a clue when I asked around, etc. RTFM gladly taken. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and LiveUpgrade
Make sure you have the latest LU patches installed. There were a lot of fixes put back in that area within the last six months or so. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thin device support in ZFS?
On 12/30/2009 2:40 PM, Richard Elling wrote: There are a few minor bumps in the road. The ATA PASSTHROUGH command, which allows TRIM to pass through the SATA drivers, was just integrated into b130. This will be more important to small servers than SANs, but the point is that all parts of the software stack need to support the effort. As such, it is not clear to me who, if anyone, inside Sun is champion for the effort -- it crosses multiple organizational boundaries. I'd think it more important for devices where this is an issue, namely SSDs, then it is spinning rust though use of the TRIM command, or something like it, would fix a lot of the issues I've seen with thin provisioning over the last six years or so. However, I'm not sure it's going to be much of an impact until you can get the entire stack - application to device - rewired to work with the concept behind it. One of the biggest issues I've seen with thin provisioning is how the applications work and you can't fix that in the file system code. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] primarycache and secondarycache properties on Solaris 10 u8
Suggest you start with the man page http://docs.sun.com/app/docs/doc/819-2240/zfs-1m On 10/15/2009 4:19 PM, Javier Conde wrote: Hello, I've seen in the what's new of Solaris 10 update 8 just released that ZFS now includes the primarycache and secondarycache properties. Is this the equivalent of the UFS directio? Does it have a similar behavior? I'm thinking about having a database on ZFS with this option, and Oracle recommends to have directio when working on top of a file system. Thanks in advance and best regards, Javi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Petabytes on a budget - blog
As some Sun folks pointed out 1) No redundancy at the power or networking side 2) Getting 2TB drives in a x4540 would make the numbers closer 3) Performance isn't going to be that great with their design but...they might not need it. On 9/2/2009 2:13 PM, Michael Shadle wrote: Yeah I wrote them about it. I said they should sell them and even better pair it with their offsite backup service kind of like a massive appliance and service option. They're not selling them but did encourage me to just make a copy of it. It looks like the only questionable piece in it is the port multipliers. Sil3726 if I recall. Which I think just barely is becoming supported in the most recent snvs? That's been something I've been wanting forever anyway. You could also just design your own case that is optimized for a bunch of disks, a mobo as long as it has ECC support and enough pci/pci-x/pcie slots for the amount of cards to add. You might be able to build one without port multipliers and just use a bunch of 8, 12, or 16 port sata controllers. I want to design a case that has two layers - an internal layer with all the drives and guts and an external layer that pushes air around it to exhaust it quietly and has additional noise dampening... Sent from my iPhone On Sep 2, 2009, at 11:01 AM, Al Hopper a...@logical-approach.com mailto:a...@logical-approach.com wrote: Interesting blog: http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ Regards, -- Al Hopper Logical Approach Inc,Plano,TX a...@logical-approach.com mailto:a...@logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Compression/copies on root pool RFE
Before I put one in ... anyone else seen one? Seems we support compression on the root pool but there is no way to enable it at install time outside of a custom script you run before the installer. I'm thinking it should be a real install time option, have a jumpstart keyword, etc. Same with copies=2 Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
On 5/1/2009 2:01 PM, Miles Nordin wrote: I've never heard of using multiple-LUN stripes for storage QoS before. Have you actually measured some improvement in this configuration over a single LUN? If so that's interesting. Because of the way queing works in the OS and in most array controllers you can get better performance in some workloads if you create more LUNs from the underlying raid set. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] StorageTek 2540 performance radically changed
On 4/20/2009 7:26 PM, Robert Milkowski wrote: Well, you need to disable cache flushes on zfs side then (or make a firmware change work) and it will make a difference. If you're running recent OpenSolaris/Solaris/SX builds you shouldn't have to disable cache flushing on the array. The driver stack should set the correct modes. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS vs ZFS + HW raid? Which is best?
On 1/20/2009 1:14 PM, Richard Elling wrote: Orvar Korvar wrote: What does this mean? Does that mean that ZFS + HW raid with raid-5 is not able to heal corrupted blocks? Then this is evidence against ZFS + HW raid, and you should only use ZFS? http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide ZFS works well with storage based protected LUNs (RAID-5 or mirrored LUNs from intelligent storage arrays). However, ZFS cannot heal corrupted blocks that are detected by ZFS checksums. It means that if ZFS does not manage redundancy, it cannot correct bad data. And there's no rule that says you can't take two array raid volumes, of any level, and mirror them with ZFS. (Or a few luns with RZ) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zero page reclaim with ZFS
Cyril Payet wrote: Hello there, Hitachi USP-V (sold as 9990V by Sun) provides thin provisioning, known as Hitachi Dynamic Provisioning (HDP). This gives a way to make the OS believes that a huge lun is available whilst its size is not physically allocated on the DataSystem side. A simple example : 100Gb seen by the OS but only 50Gb physically allocated in the frame, in a physical devices stock (called a HDP-pool) The USP-V is now able to reclaim zero pages that are not used by a Filesystem. Then, it could put them back to this physical pool, as free many 42Mb blocks. As far as I know, when a file is deleted, zfs just stop to reference blocks associated to this file, like MMU does with RAM. Blocks are not deleted, nor zeored (sounds very good to get back to some files after a crash !). Is there a way to transform - a posteriori or a priori - these unreferenced blocks to zero blocks to make the HDS-Frame able to reclaime these ones ? I know that this will create some overhead... It might leads to a smaller block allocation history but could be very usefull for zero-pages-reclaim. I do hope that my question was clear enough... Thanx for your hints, There are some mainframe filesystems that do such things. I think there was also an STK array - Iceberg[?] - that had similar functionality. However, why would you use ZFS on top of HDP? If the filesystem lets you grow dynamically, and the OS let's you add storage dynamically or grow the LUNs when the array doeswhat does HDP get you? Serious question as I get asked it all the time and I can't come up with a good answer outside of procedural things such as, We don't like to bother the storage guys or, We thin provision everything no matter the app/fs/os or choose your own adventure. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zero page reclaim with ZFS
On 12/29/2008 8:20 PM, Tim wrote: On Mon, Dec 29, 2008 at 6:09 PM, Torrey McMahon tmcmah...@yahoo.com mailto:tmcmah...@yahoo.com wrote: There are some mainframe filesystems that do such things. I think there was also an STK array - Iceberg[?] - that had similar functionality. However, why would you use ZFS on top of HDP? If the filesystem lets you grow dynamically, and the OS let's you add storage dynamically or grow the LUNs when the array doeswhat does HDP get you? Serious question as I get asked it all the time and I can't come up with a good answer outside of procedural things such as, We don't like to bother the storage guys or, We thin provision everything no matter the app/fs/os or choose your own adventure. Assign your database admin who swears he needs 2TB day one a 2TB lun. And 6 months from now when he's really only using 200GB, you aren't wasting 1.8TB of disk on him. I run into the same thing but once I say, I can add more space without downtime they tend to smarten up. Also, ZFS will not reuse blocks in a, for lack of better words, economical fashion. If you throw them a 2TB LUN ZFS will allocate blocks all over the LUN when they're only using a small fraction. Unless you have, as the original poster mentioned, a empty block reclaim you'll have problems. UFS can show the same results btw. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zero page reclaim with ZFS
On 12/29/2008 10:36 PM, Tim wrote: On Mon, Dec 29, 2008 at 8:52 PM, Torrey McMahon tmcmah...@yahoo.com mailto:tmcmah...@yahoo.com wrote: On 12/29/2008 8:20 PM, Tim wrote: I run into the same thing but once I say, I can add more space without downtime they tend to smarten up. Also, ZFS will not reuse blocks in a, for lack of better words, economical fashion. If you throw them a 2TB LUN ZFS will allocate blocks all over the LUN when they're only using a small fraction. Unless you have, as the original poster mentioned, a empty block reclaim you'll have problems. UFS can show the same results btw. I'm not arguing anything towards his specific scenario. You said you couldn't imagine why anyone would ever want thin provisioning, so I told you why. Some admins do not have the luxury of trying to debate with other teams they work with as to why they should do things a different way than they want to ;) That speaks nothing of the change control needed to even get a LUN grown in some shops. It's out there, it's being used, it isn't a good fit for zfs. Right...I called those process issues. Perhaps organizational issues would have been better? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware Raid Vs ZFS implementation on Sun X4150/X4450
I'm pretty sure I understand the importance of a snapshot API. (You take the snap, then you do the backup or whatever) My point is that, at least on my quick read, you can do most of the same things with the ZFS command line utilities. The relevant question would then be how stable that is for the type of work we're talking about. Joseph Zhou wrote: Ok, Torrey, I like you, so one more comment before I go to bed -- Please go study the EMC NetWorker 7.5, and why EMC can claim leadership in VSS support. Then, if you still don't understand the importance of VSS, just ask me in an open fashion, I will teach you. The importance of storage in system and application optimization can be very significant. You do coding, do you know what's TGT from IBM in COBOL, to be able to claim enterprise technology? If not, please study. http://publib.boulder.ibm.com/infocenter/pdthelp/v1r1/index.jsp?topic=/com.ibm.entcobol.doc_4.1/PGandLR/ref/rpbug10.htm Open Storage is a great concept, but we can only win with realy advantages, not fake marketing lines. I hope everyone enjoyed the discussion. I did. zStorageAnalyst - Original Message - From: Torrey McMahon [EMAIL PROTECTED] To: Joseph Zhou [EMAIL PROTECTED] Cc: Richard Elling [EMAIL PROTECTED]; William D. Hathaway [EMAIL PROTECTED]; [EMAIL PROTECTED]; zfs-discuss@opensolaris.org; [EMAIL PROTECTED] Sent: Sunday, December 07, 2008 2:40 AM Subject: Re: [zfs-discuss] Hardware Raid Vs ZFS implementation on Sun X4150/X4450 Compared to hw raid only snapshots ZFS is still, imho, easier to use. If you start talking about VSS, aka shadow copy for Windows, you're now at the fs level. I can see that VSS offers an API for 3rd parties to use but, as I literally just started reading about it, I'm not an expert. From a quick glance I think the ZFS feature set is comparable. Is there a C++ API to ZFS? Not that I know of. Do you need one? Can't think of a reason off the top of my head given the way the zpool/zfs commands work. Joseph Zhou wrote: Torrey, now this impressive as the old days with Sun Storage. Ok, ZFS PiT is only a software solution. The Windows VSS is not only a software solution, but also a 3rd party integration standard from MS. What's your comment on ZFS PiT is better than MS PiT, in light of openness and 3rd-party integration??? Talking about garbage! z - Original Message - From: Torrey McMahon [EMAIL PROTECTED] To: Richard Elling [EMAIL PROTECTED] Cc: Joseph Zhou [EMAIL PROTECTED]; William D. Hathaway [EMAIL PROTECTED]; [EMAIL PROTECTED]; zfs-discuss@opensolaris.org; [EMAIL PROTECTED] Sent: Sunday, December 07, 2008 1:58 AM Subject: Re: [zfs-discuss] Hardware Raid Vs ZFS implementation on Sun X4150/X4450 Richard Elling wrote: Joseph Zhou wrote: Yeah? http://www.adaptec.com/en-US/products/Controllers/Hardware/sas/value/SAS-31605/_details/Series3_FAQs.htm Snapshot is a big deal? Snapshot is a big deal, but you will find most hardware RAID implementations are somewhat limited, as the above adaptec only supports 4 snapshots and it is an optional feature. You will find many array vendors will be happy to charge lots of money for the snapshot feature. On top of that since the ZFS snapshot is at the file system level it's much easier to use. You don't have to quiesce the file system first or hope that when you take the snapshot you get a consistent data set. I've seen plenty of folks take hw raid snapshots without locking the file system first, let alone quiescing the app, and getting garbage. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware Raid Vs ZFS implementation on Sun X4150/X4450
Ian Collins wrote: On Mon 08/12/08 08:14 , Torrey McMahon [EMAIL PROTECTED] sent: I'm pretty sure I understand the importance of a snapshot API. (You take the snap, then you do the backup or whatever) My point is that, at least on my quick read, you can do most of the same things with the ZFS command line utilities. The relevant question would then be how stable that is for the type of work we're talking about. Or through the APIs provided by libzfs. I'm not sure if those are published/supported as opposed to just being readable in the source. I think the ADM project is the droid we're looking for. Automatic Data Migration http://opensolaris.org/os/project/adm/ ADM is designed to use the Data Storage Management API (aka XDSM) as defined in the CAE Specification XDSM as documented by the Open Group. XDSM provides an Open Standard API to Data Migration Applications (DMAPI) to manage file backup and recovery, automatic file migration, and file replication. ADM will take advantage of these APIs as a privileged application and extension to ZFS. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware Raid Vs ZFS implementation on Sun X4150/X4450
Richard Elling wrote: Joseph Zhou wrote: Yeah? http://www.adaptec.com/en-US/products/Controllers/Hardware/sas/value/SAS-31605/_details/Series3_FAQs.htm Snapshot is a big deal? Snapshot is a big deal, but you will find most hardware RAID implementations are somewhat limited, as the above adaptec only supports 4 snapshots and it is an optional feature. You will find many array vendors will be happy to charge lots of money for the snapshot feature. On top of that since the ZFS snapshot is at the file system level it's much easier to use. You don't have to quiesce the file system first or hope that when you take the snapshot you get a consistent data set. I've seen plenty of folks take hw raid snapshots without locking the file system first, let alone quiescing the app, and getting garbage. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware Raid Vs ZFS implementation on Sun X4150/X4450
Compared to hw raid only snapshots ZFS is still, imho, easier to use. If you start talking about VSS, aka shadow copy for Windows, you're now at the fs level. I can see that VSS offers an API for 3rd parties to use but, as I literally just started reading about it, I'm not an expert. From a quick glance I think the ZFS feature set is comparable. Is there a C++ API to ZFS? Not that I know of. Do you need one? Can't think of a reason off the top of my head given the way the zpool/zfs commands work. Joseph Zhou wrote: Torrey, now this impressive as the old days with Sun Storage. Ok, ZFS PiT is only a software solution. The Windows VSS is not only a software solution, but also a 3rd party integration standard from MS. What's your comment on ZFS PiT is better than MS PiT, in light of openness and 3rd-party integration??? Talking about garbage! z - Original Message - From: Torrey McMahon [EMAIL PROTECTED] To: Richard Elling [EMAIL PROTECTED] Cc: Joseph Zhou [EMAIL PROTECTED]; William D. Hathaway [EMAIL PROTECTED]; [EMAIL PROTECTED]; zfs-discuss@opensolaris.org; [EMAIL PROTECTED] Sent: Sunday, December 07, 2008 1:58 AM Subject: Re: [zfs-discuss] Hardware Raid Vs ZFS implementation on Sun X4150/X4450 Richard Elling wrote: Joseph Zhou wrote: Yeah? http://www.adaptec.com/en-US/products/Controllers/Hardware/sas/value/SAS-31605/_details/Series3_FAQs.htm Snapshot is a big deal? Snapshot is a big deal, but you will find most hardware RAID implementations are somewhat limited, as the above adaptec only supports 4 snapshots and it is an optional feature. You will find many array vendors will be happy to charge lots of money for the snapshot feature. On top of that since the ZFS snapshot is at the file system level it's much easier to use. You don't have to quiesce the file system first or hope that when you take the snapshot you get a consistent data set. I've seen plenty of folks take hw raid snapshots without locking the file system first, let alone quiescing the app, and getting garbage. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tuning ZFS for Sun Java Messaging Server
You may want to ask your SAN vendor if they have a setting you can make to no-op the cache flush. That way you don't have to worry about the flush behavior if you change/add different arrays. Adam N. Copeland wrote: Thanks for the replies. It appears the problem is that we are I/O bound. We have our SAN guy looking into possibly moving us to faster spindles. In the meantime, I wanted to implement whatever was possible to give us breathing room. Turning off atime certainly helped, but we are definitely not completely out of the drink yet. I also found that disabling the ZFS cache flush as per the Evil Tuning Guide was a huge boon, considering we're on a battery-backed (non-Sun) SAN. Thanks, Adam Richard Elling wrote: As it happens, I'm currently involved with a project doing some performance analysis for this... but it is currently a WIP. Comments below. Robert Milkowski wrote: Hello Adam, Tuesday, October 21, 2008, 2:00:46 PM, you wrote: ANC We're using a rather large (3.8TB) ZFS volume for our mailstores on a ANC JMS setup. Does anybody have any tips for tuning ZFS for JMS? I'm ANC looking for even the most obvious tips, as I am a bit of a novice. Thanks, Well, it's kind of broad topic and it depends on a specific environment. Then do not tune for the sake of tuning - try to understand your problem first. Nevertheless you should consider things like (random order): 1. RAID level - you probably will end-up with relatively small random IOs - generally avoid RAID-Z Of course it could be that RAID-Z in your environment is perfectly fine. There are some write latency-sensitive areas that will begin to cause consternation for large loads. Storage tuning is very important in this space. In our case, we're using a ST6540 array which has a decent write cache and fast back-end. 2. Depending on your workload and disk subsystem ZFS's slog on SSD could help to improve performance My experiments show that this is not the main performance issue for large message volumes. 3. Disable atime updates on zfs file system Agree. JMS doesn't use it, so it just means extra work. 4. Enabling compression like lzjb in theory could help - depends on how weel you data would compress and how much CPU you have left and if you are mostly IO bond We have not experimented with this yet, but know that some of the latency-sensitive writes are files with a small number of bytes, which will not compress to be less than one disk block. [opportunities for cleverness are here :-)] There may be a benefit for the message body, but in my tests we are not concentrating on that at this time. 5. ZFS recordsize - probably not as in most cases when you read anything from email you will probably read entire mail anyway. Nevertheless could be easily checked with dtrace. This does not seem to be an issue. 6. IIRC JMS keeps an index/db file per mailbox - so just maybe L2ARC on large SSD would help assuming it would nicely cache these files - would need to be simulated/tested This does not seem to be an issue, but in our testing the message stores have plenty of memory, and hence, ARC size is on the order of tens of GBytes. 7. Disabling vdev pre-fetching in ZFS could help - see ZFS Evile tuning guide My experiments showed no benefit by disabling pre-fetch. However, there are multiple layers of pre-fetching at play when you are using an array, and we haven't done a complete analysis on this yet. It is clear that we are not bandwidth limited, so prefetching may not hurt. Except for #3 and maybe #7 first identify what is your problem and what are you trying to fix. Yep. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tuning ZFS for Sun Java Messaging Server
Richard Elling wrote: Adam N. Copeland wrote: Thanks for the replies. It appears the problem is that we are I/O bound. We have our SAN guy looking into possibly moving us to faster spindles. In the meantime, I wanted to implement whatever was possible to give us breathing room. Turning off atime certainly helped, but we are definitely not completely out of the drink yet. I also found that disabling the ZFS cache flush as per the Evil Tuning Guide was a huge boon, considering we're on a battery-backed (non-Sun) SAN. Really? Which OS version are you on? This should have been fixed in Solaris 10 5/08 (it is a fix in the [s]sd driver). Caveat: there may be some devices which do not properly negotiate the SYNC_NV bit. In my tests, using Solaris 10 5/08, disabling the cache flush made zero difference. PSARC 2007/053 If I read through the code correctly... If the array doesn't respond to the device inquiry, you haven't made an entry to sd.conf for the array, or it isn't hard coded in the sd.c table - I think there are only two in that state - then you'd have to disable the cache flush. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
Spencer Shepler wrote: On Jul 10, 2008, at 7:05 AM, Ross wrote: Oh god, I hope not. A patent on fitting a card in a PCI-E slot, or using nvram with RAID (which raid controllers have been doing for years) would just be rediculous. This is nothing more than cache, and even with the American patent system I'd have though it hard to get that past the obviousness test. How quickly they forget. Take a look at the Prestoserve User's Guide for a refresher... http://docs.sun.com/app/docs/doc/801-4896-11 Or Fast Write Cache http://docs.sun.com/app/docs/coll/fast-write-cache2.0 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Deferred Frees
I'm doing some simple testing of ZFS block reuse and was wondering when deferred frees kick in. Is it on some sort of timer to ensure data consistency? Does an other routine call it? Would something as simple as sync(1M) get the free block list written out so future allocations could use the space? ... or am I way off in the weeds? :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS conflict with MAID?
A Darren Dunham wrote: On Tue, Jun 10, 2008 at 05:32:21PM -0400, Torrey McMahon wrote: However, some apps will probably be very unhappy if i/o takes 60 seconds to complete. It's certainly not uncommon for that to occur in an NFS environment. All of our applications seem to hang on just fine for minor planned and unplanned outages. Would the apps behave differently in this case? (I'm certainly not thinking of a production database for such a configuration). Some applications have their own internal timers that track i/o time and, if it doesn't complete in time, will error out. I don't know which part of the stack the timer was in but I've seen an Oracle RAC cluster on QFS timeout much faster then the SCSI retries normally allow for. (I think it was Oracle in that case...) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS conflict with MAID?
Richard Elling wrote: Tobias Exner wrote: Hi John, I've done some tests with a SUN X4500 with zfs and MAID using the powerd of Solaris 10 to power down the disks which weren't access for a configured time. It's working fine... The only thing I run into was the problem that it took roundabout a minute to power on 4 disks in a zfs-pool. The problem seems to be that the powerd starts the disks sequentially. Did you power down disks or spin down disks? It is relatively easy to spin down (or up) disks with luxadm stop (start). If a disk is accessed, then it will spin itself up. By default, the timeout for disk response is 60 seconds, and most disks can spin up in less than 60 seconds. However, some apps will probably be very unhappy if i/o takes 60 seconds to complete. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Sun Disk arrays - Opinions?
The release should be out any day now. I think its being pushed to the external download site whilst we type/read. Andy Lubel wrote: The limitation existed in every Sun branded Engenio array we tested - 2510,2530,2540,6130,6540. This limitation is on volumes. You will not be able to present a lun larger than that magical 1.998TB. I think it is a combination of both in CAM and the firmware. Can't do it with sscs either... Warm and fuzzy: Sun engineers told me they would have a new release of CAM (and firmware bundle) in late June which would resolve this limitation. Or just do ZFS (or even SVM) setup like Bob and I did. Its actually pretty nice because the traffic will split to both controllers giving you theoretically more throughput so long as MPxIO is functioning properly. Only (minor) downside is parity is being transmitted from the host to the disks rather than living on the controller entirely. -Andy From: [EMAIL PROTECTED] on behalf of Torrey McMahon Sent: Mon 5/19/2008 1:59 PM To: Bob Friesenhahn Cc: zfs-discuss@opensolaris.org; Kenny Subject: Re: [zfs-discuss] ZFS and Sun Disk arrays - Opinions? Bob Friesenhahn wrote: On Mon, 19 May 2008, Kenny wrote: Bob M.- Thanks for the heads up on the 2 (1.998) TN Lun limit. This has me a little concerned esp. since I have 1 TB drives being delivered! Also thanks for the scsi cache flushing heads up, yet another item to lookup! grin I am not sure if this LUN size limit really exists, or if it exists, in which cases it actually applies. On my drive array, I created a 3.6GB RAID-0 pool with all 12 drives included during the testing process. Unfortunately, I don't recall if I created a LUN using all the space. I don't recall ever seeing mention of a 2TB limit in the CAM user interface or in the documentation. The Solaris LUN limit is gone if you're using Solaris 10 and recent patches. The array limit(s) are tied to the type of array you're using. (Which type is this again?) CAM shouldn't be enforcing any limits of its own but only reporting back when the array complains. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Backup-ing up ZFS configurations
eric kustarz wrote: So even with the above, if you add a vdev, slog, or l2arc later on, that can be lost via the history being a ring buffer. There's a RFE for essentially taking your current 'zpool status' output and outputting a config (one that could be used to create a brand new pool): 6276640 zpool config I'm surprised there haven't been more hands raised for this one. It would be very handy for a change management process, setting up DR sites, testing, etc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Round-robin NFS protocol with ZFS
Tim wrote: He wants to mount the ZFS filesystem (I'm assuming off of a backend SAN storage array) to two heads, then round-robin NFS connections between the heads to essentially *double* the throughput. pNFS is the droid you are looking for. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SunMC module for ZFS
Anyone have a pointer to a general ZFS health/monitoring module for SunMC? There isn't one baked into SunMC proper which means I get to write one myself if someone hasn't already done it. Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Case #65841812
I'm not an Oracle expert but I don't think Oracle checksumming can correct data. If you have ZFS checksums enabled, and you're mirroring in your zpools, then ZFS can self-correct as long the checksum on the other half of the mirror is good. Mertol Ozyoney wrote: Don't take my words as an expert advice, as I am newbie when it comes to ZFS. If I am not mistaken, if you are only using Oracle on the particular Zpol, Oracle Checksum offers better protection against data corruption. You can disable ZFS checksums. Best regards Mertol Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Scott Macdonald - Sun Microsystem Sent: 01 Şubat 2008 Cuma 15:31 To: zfs-discuss@opensolaris.org; [EMAIL PROTECTED] Subject: [zfs-discuss] Case #65841812 Below is my customers issue. I am stuck on this one. I would appreciate if someone could help me out on this. Thanks in advance! ZFS Checksum feature: I/O checksum is one of the main ZFS features; however, there is also block checksum done by Oracle. This is good when utilizing UFS since it does not do checksums, but with ZFS it can be a waste of CPU time. Suggestions have been made to change the Oracle db_block_checksum parameter to false which may give Significant performance gain on ZFS. What are Sun's stance and/or suggestions on making this change on the ZFS side as well as making the changes on the Oracle side. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
Kyle McDonald wrote: Vincent Fox wrote: So the point is, a JBOD with a flash drive in one (or two to mirror the ZIL) of the slots would be a lot SIMPLER. We've all spent the last decade or two offloading functions into specialized hardware, that has turned into these massive unneccessarily complex things. I don't want to go to a new training class everytime we buy a new model of storage unit. I don't want to have to setup a new server on my private network to run the Java GUI management software for that array and all the other BS that array vendors put us through. I just want storage. Good Point. You still need interfaces, of some kind, to manage the device. Temp sensors? Drive fru information? All that information has to go out, and some in, over an interface of some sort. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS under VMware
Lewis Thompson wrote: Hello, I'm planning to use VMware Server on Ubuntu to host multiple VMs, one of which will be a Solaris instance for the purposes of ZFS I would give the ZFS VM two physical disks for my zpool, e.g. /dev/sda and /dev/sdb, in addition to the VMware virtual disk for the Solaris OS Now I know that Solaris/ZFS likes to have total control over the disks to ensure writes are flushed as and when it is ready for them to happen, so I wonder if anybody comment on what implications using the disks in this way (i.e. through Linux and then VMware) has on the control Solaris has over these disks? By using a VM will I be missing out in terms of reliability? If so, can anybody suggest any improvements I could make while still allowing Solaris/ZFS to run in a VM? I'm not sure what the perf aspects would be but it depends on what the VMware software passes through. Does it ignore cache sync commands in its i/o stack? Got me. You won't be missing out on reliability but you will be introducing more layers in the stack where something could go wrong. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS performance on ZFS vs UFS
Robert Milkowski wrote: Hello Darren, DJM BTW there isn't really any such think as disk corruption there is DJM data corruption :-) Well, if you scratch it hard enough :) http://www.philohome.com/hammerhead/broken-disk.jpg :-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iscsi on zvol
Jim Dunham wrote: This raises a key point that that you should be aware of. ZFS does not support shared access to the same ZFS filesystem. unless you put NFS or something on top of it. (I always forget that part myself.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS via Virtualized Solaris?
Peter Schuller wrote: From what I read, one of the main things about ZFS is Don't trust the underlying hardware. If this is the case, could I run Solaris under VirtualBox or under some other emulated environment and still get the benefits of ZFS such as end to end data integrity? You could probably answer that question by changing the phrase to Don't trust the underlying virtual hardware! ZFS doesn't care if the storage is virtualised or not. But worth noting is that, as with for example hardware RAID, if you intend to take advantage of the self-healing properties of ZFS with multiple disks, you must expose the individual disks to your mirror/raidz/raidz2 individually through the virtualization environment and use them in your pool. Or expose enough LUNs to take advantage of it. Two raid LUNs in a mirror for example. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does Oracle support ZFS as a file system with Oracle RAC?
Louwtjie Burger wrote: On 12/19/07, David Magda [EMAIL PROTECTED] wrote: On Dec 18, 2007, at 12:23, Mike Gerdts wrote: 2) Database files - I'll lump redo logs, etc. in with this. In Oracle RAC these must live on a shared-rw (e.g. clustered VxFS, NFS) file system. ZFS does not do this. If you can use NFS, can't you put things on ZFS and then export? Is it a good idea to put a oracle database on the other end of a NFS mount ? (performance wise) Depends on the characteristics of your network and what amount of performance you need. (As with most things it depends...) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] SAN arrays with NVRAM cache : ZIL and zfs_nocacheflush
Nicolas Dorfsman wrote: Le 27 nov. 07 à 16:17, Torrey McMahon a écrit : According to the array vendor the 99xx arrays no-op the cache flush command. No need to set the /etc/system flag. http://blogs.sun.com/torrey/entry/zfs_and_99xx_storage_arrays Perfect ! Thanks Torrey. Just realize that the HDS midrange, which Sun does not resell, are different beasts then the 99xx line from Sun. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun's storage product roadmap?
The profit stuff has been NDA for awhile but we started telling the street a while back and they seem to like the idea. :) Selim Daoud wrote: wasn't that an NDA info?? s- On 10/18/07, Torrey McMahon [EMAIL PROTECTED] wrote: MC wrote: Sun's storage strategy: 1) Finish Indiana and distro constructor 2) (ship stuff using ZFS-Indiana) 3) Success 4) Profit :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun's storage product roadmap?
MC wrote: Sun's storage strategy: 1) Finish Indiana and distro constructor 2) (ship stuff using ZFS-Indiana) 3) Success 4) Profit :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The ZFS-Man.
Jonathan Edwards wrote: On Sep 21, 2007, at 14:57, eric kustarz wrote: Hi. I gave a talk about ZFS during EuroBSDCon 2007, and because it won the the best talk award and some find it funny, here it is: http://youtube.com/watch?v=o3TGM0T1CvE a bit better version is here: http://people.freebsd.org/~pjd/misc/zfs/zfs-man.swf Looks like Jeff has been working out :) my first thought too: http://blogs.sun.com/bonwick/resource/images/bonwick.portrait.jpg funny - i always pictured this as UFS-man though: http://www.benbakerphoto.com/business/47573_8C-after.jpg but what's going on with the sheep there? Got me but they do look kind of nervous. (Happy friday folks...) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Solaris 10 Update 4 Patches
Did you upgrade your pools? zpool upgrade -a John-Paul Drawneek wrote: err, I installed the patch and am still on zfs 3? solaris 10 u3 with kernel patch 120011-14 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored zpool across network
Mark wrote: Hi All, Im just wondering (i figure you can do this but dont know what hardware and stuff i would need) if I can set up a mirror of a raidz zpool across a network. Basically, the setup is a large volume of Hi-Def video is being streamed from a camera, onto an editing timeline. This will be written to a network share. Due to the large amounts of data, ZFS is a really good option for us. But we need a backup. We need to do it on generic hardware (i was thinking AMD64 with an array of large 7200rpm hard drives), and therefore i think im going to have one box mirroring the other box. They will be connected by gigabit ethernet. So my question is how do I mirror one raidz Array across the network to the other? rsync? zfs send/rcv? AVS? iSCSI targets on the two boxes? Lots of ways to do it. Depends what your definition of backup is. Time based? Extra redundancy? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Snapshots and worm devices
Has anyone thought about using snapshots and WORM devices. In theory, you'd have to keep the WORM drive out of the pool, or as a special device, and it would have to be a full snapshot even though we really don't have those. Any plans in this area? I could take a snapshot, clone it, then copy it to the worm device with cpio or friends but that adds time and possibility of error(s). Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and powerpath
Carisdad wrote: Peter Tribble wrote: # powermt display dev=all Pseudo name=emcpower0a CLARiiON ID=APM00043600837 [] Logical device ID=600601600C4912003AB4B247BA2BDA11 [LUN 46] state=alive; policy=CLAROpt; priority=0; queued-IOs=0 Owner: default=SP B, current=SP B == Host --- - Stor - -- I/O Path - -- Stats --- ### HW PathI/O PathsInterf. ModeState Q-IOs Errors == 3073 [EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 c2t500601613060099Cd1s0 SP A1 active alive 0 0 3073 [EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 c2t500601693060099Cd1s0 SP B1 active alive 0 0 3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 c3t500601603060099Cd1s0 SP A0 active alive 0 0 3072 [EMAIL PROTECTED],70/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 c3t500601683060099Cd1s0 SP B0 active alive 0 0 If it helps at all. We're having a similar problem. Any LUN's configured with their default owner to be SP B, don't get along with ZFS. We're running on a T2000, With Emulex cards and the ssd driver. MPXIO seems to work well for most cases, but the SAN guys are not comfortable with it. Are you using the top level powerpath device? Is the clariion in an auto-trespass mode where any i/o going down the alt path will cause the LUNs to move? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and powerpath
Darren Dunham wrote: If it helps at all. We're having a similar problem. Any LUN's configured with their default owner to be SP B, don't get along with ZFS. We're running on a T2000, With Emulex cards and the ssd driver. MPXIO seems to work well for most cases, but the SAN guys are not comfortable with it. Are you using the top level powerpath device? Is the clariion in an auto-trespass mode where any i/o going down the alt path will cause the LUNs to move? My previous experience with powerpath was that it rode below the Solaris device layer. So you couldn't cause trespass by using the wrong device. It would just go to powerpath which would choose the link to use on its own. Is this not true or has it changed over time? I haven't looked at power path for some time but it used to be the opposite. The powerpath node sat on top of the actual device paths. One of the selling points of mpxio is that it doesn't have that problem. (At least for devices it supports.) Most of the multipath software had that same limitation However, I'm not an expert on powerpath by any stretch of the imagination. I just took a quick look at the powerpath manual (4.0 version.) and it says you can now use both types which seems a little confusing. Again, I's be interested to see if using the pseudo-device works better.not to mention how it works using the direct path disk entry. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to remove sun volume mgr configuration?
Bill Sommerfeld wrote: On Mon, 2007-07-16 at 18:19 -0700, Russ Petruzzelli wrote: Or am I just getting myself into shark infested waters? configurations that might be interesting to play with: (emphasis here on play...) 1) use the T3's management CLI to reconfigure the T3 into two raid-0 volumes, and mirror them with ZFS. 2) if you have some JBODs available as well, use the T3 (which has a modest-sized battery backed write cache in the controller) as a separate log device (that's a new feature introduced in a recent nevada build). Has the project that lets you specify an array has having a battery back up gone in yet? If not then wouldn't the sync cache problem be in play? I don't know if the T3 honors cache flush commands or set the i've got a stable cache bit in the relevant scsi mode page. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to remove sun volume mgr configuration?
James C. McPherson wrote: The T3B with fw v3.x (I think) and the T4 (aka 6020 tray) allow more than two volumes, but you're still quite restricted in what you can do with them. You are limited to two raid groups with slices on top of those raid groups presented as LUNs. I'd just stick with the raid groups and not try to get really overboard and use slices because you can just [SNIP] You can use ZFS on that volume, but it will have no redundancy at the ZFS level, only at the disk level controlled by the T3. Well ... you could create two volumes on the array and mirror those using ZFS Some might say that's a waste of space :) ... stick to R0 and then mirror with ZFS? At least T3 will let you do that as opposed to other storage arrays that let you pick from R1 and R5 only. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and powerpath
Peter Tribble wrote: On 7/13/07, Alderman, Sean [EMAIL PROTECTED] wrote: I wonder what kind of card Peter's using and if there is a potential linkage there. We've got the Sun branded Emulux cards in our sparcs. I also wonder if Peter were able to allocate an additional LUN to his system whether or not he'd be able to create a pool on that new LUN. On a different continent and I didn't buy it. Shows up as lpfc (is that Emulex?). I'm not sure that's related - I can see the LUNs and devices, it's just that zfs isn't happy. Those, lpfc, are native Emulex drivers. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and powerpath
[EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 07/13/2007 02:21:52 PM: Peter Tribble wrote: I've not got that far. During an import, ZFS just pokes around - there doesn't seem to be an explicit way to tell it which particular devices or SAN paths to use. You can't tell it which devices to use in a straightforward manner. But you can tell it which directories to scan. zpool import [-d dir] By default, it scans /dev/dsk. Does truss of zfs import show the powerrpath devices being opened and read from? AFAIK powerpath does not really need to use the powerpath pseudo devices -- they are just there for convenience. I would expect the drives to be readable from either the c1 devices or emc*. ZFS needs to use the top level multipath device or bad things will probably happen in a failover or in initial zpool creation. Fopr example: You'll try to use the device on two paths and cause a lun failover to occur. Mpxio fixes a lot of these issues. I strongly suggest using mpxio instead of powerpath but sometimes its all you can use if the array is new and mpxio doesn't have the hooks for it ... yet. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Plans for swapping to part of a pool
I really don't want to bring this up but ... Why do we still tell people to use swap volumes? Would we have the same sort of issue with the dump device so we need to fix it anyway? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log
Bryan Cantrill wrote: On Tue, Jul 03, 2007 at 10:26:20AM -0500, Albert Chin wrote: PSARC 2007/171 will be available in b68. Any documentation anywhere on how to take advantage of it? Some of the Sun storage arrays contain NVRAM. It would be really nice if the array NVRAM would be available for ZIL storage. It depends on your array, of course, but in most arrays you can control the amount of write cache (i.e., NVRAM) dedicated to particular LUNs. So to use the new separate logging most effectively, you should take your array, and dedicate all of your NVRAM to a single LUN that you then use as your separate log device. Your pool should then use a LUN or LUNs that do not have any NVRAM dedicated to it. On some of the new Sun midrange arrays you can disable cache to a LUN but I've never seen hooks that let you dedicate a certain amount of cache to one LUN in particular. (None of the older midrange arrays let you do this.) Some of the high end arrays allow you to pin some data in cache like the 9990. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs space efficiency
The interesting collision is going to be file system level encryption vs. de-duplication as the former makes the latter pretty difficult. dave johnson wrote: How other storage systems do it is by calculating a hash value for said file (or block), storing that value in a db, then checking every new file (or block) commit against the db for a match and if found, replace file (or block) with duplicate entry in db. The most common non-proprietary hash calc for file-level deduplication seems to be the combination of the SHA1 and MD5 together. Collisions have been shown to exist in MD5 and theoried to exist in SHA1 by extrapolation, but the probibility of collitions occuring simultaneously both is to small as the capacity of ZFS is to large :) While computationally intense, this would be a VERY welcome feature addition to ZFS and given the existing infrastructure within the filesystem already, while non-trivial by any means, it seems a prime candidate. I am not a programmer so I do not have the expertise to spearhead such a movement but I would think getting at least a placeholder Goals and Objectives page into the OZFS community pages would be a good start even if movement on this doesn't come for a year or more. Thoughts ? -=dave - Original Message - From: Gary Mills [EMAIL PROTECTED] To: Erik Trimble [EMAIL PROTECTED] Cc: Matthew Ahrens [EMAIL PROTECTED]; roland [EMAIL PROTECTED]; zfs-discuss@opensolaris.org Sent: Sunday, June 24, 2007 3:58 PM Subject: Re: [zfs-discuss] zfs space efficiency On Sun, Jun 24, 2007 at 03:39:40PM -0700, Erik Trimble wrote: Matthew Ahrens wrote: Will Murnane wrote: On 6/23/07, Erik Trimble [EMAIL PROTECTED] wrote: Now, wouldn't it be nice to have syscalls which would implement cp and mv, thus abstracting it away from the userland app? A copyfile primitive would be great! It would solve the problem of having all those friends to deal with -- stat(), extended attributes, UFS ACLs, NFSv4 ACLs, CIFS attributes, etc. That isn't to say that it would have to be implemented in the kernel; it could easily be a library function. I'm with Matt. Having a copyfile library/sys call would be of significant advantage. In this case, we can't currently take advantage of the CoW ability of ZFS when doing 'cp A B' (as has been pointed out to me). 'cp' simply opens file A with read(), opens a new file B with write(), and then shuffles the data between the two. Now, if we had a copyfile(A,B) primitive, then the 'cp' binary would simply call this function, and, depending on the underlying FS, it would get implemented differently. In UFS, it would work as it does now. For ZFS, it would work like a snapshot, where file A and B share data blocks (at least until someone starts to update either A or B). Isn't this technique an instance of `deduplication', which seems to be a hot idea in storage these days? I wonder if it could be done automatically, behind the scenes, in some fashion. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS - SAN and Raid
Gary Mills wrote: On Wed, Jun 20, 2007 at 12:23:18PM -0400, Torrey McMahon wrote: James C. McPherson wrote: Roshan Perera wrote: But Roshan, if your pool is not replicated from ZFS' point of view, then all the multipathing and raid controller backup in the world will not make a difference. James, I Agree from ZFS point of view. However, from the EMC or the customer point of view they want to do the replication at the EMC level and not from ZFS. By replicating at the ZFS level they will loose some storage and its doubling the replication. Its just customer use to working with Veritas and UFS and they don't want to change their habbits. I just have to convince the customer to use ZFS replication. that's a great shame because if they actually want to make use of the features of ZFS such as replication, then they need to be serious about configuring their storage to play in the ZFS world and that means replication that ZFS knows about. Also, how does replication at the ZFS level use more storage - I'm assuming raw block - then at the array level? SAN storage generally doesn't work that way. They use some magical redundancy scheme, which may be RAID-5 or WAFL, from which the Storage Administrator carves out virtual disks. These are best viewed as an array of blocks. All disk administration, such as replacing failed disks, takes place on the storage device without affecting the virtual disks. There's no need for disk administration or additional redundancy on the client side. If more space is needed on the client, the Storage Administrator simply expands the virtual disk by extending its blocks. ZFS needs to play nicely in this environment because that's what's available in large organizations that have centralized their storage. Asking for raw disks doesn't work. Are we talking about replication - I have a copy of my data on an other system - or redundancy - I have a system where I can tolerate a local failure? ...and I understand the ZFS has to play nice with HW raid argument. :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS - SAN and Raid
Victor Engle wrote: On 6/20/07, Torrey McMahon [EMAIL PROTECTED] wrote: Also, how does replication at the ZFS level use more storage - I'm assuming raw block - then at the array level? ___ Just to add to the previous comments. In the case where you have a SAN array providing storage to a host for use with ZFS the SAN storage really needs to be redundant in the array AND the zpools need to be redundant pools. The reason the SAN storage should be redundant is that SAN arrays are designed to serve logical units. The logical units are usually allocated from a raid set, storage pool or aggregate of some kind. The array side pool/aggregate may include 10 300GB disks and may have 100+ luns allocated from it for example. If redundancy is not used in the array side pool/aggregate and then 1 disk failure will kill 100+ luns at once. That makes a lot of sense in configurations where an array is exporting LUNs built on raid volumes to a set of heterogeneous hosts. If you're direct connected to a single box running ZFS or a set of boxes running ZFS you probably want to export something as close to the raw disks as possible while maintaining ZFS level redundancy. (Like two R5 LUNs in a ZFS mirror.) Creating a raid set, carving out lots of LUNs and then handing them all over to ZFS isn't going to buy you a lot and could cause performance issues. (LUN skew for example.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS - SAN and Raid
James C. McPherson wrote: Roshan Perera wrote: But Roshan, if your pool is not replicated from ZFS' point of view, then all the multipathing and raid controller backup in the world will not make a difference. James, I Agree from ZFS point of view. However, from the EMC or the customer point of view they want to do the replication at the EMC level and not from ZFS. By replicating at the ZFS level they will loose some storage and its doubling the replication. Its just customer use to working with Veritas and UFS and they don't want to change their habbits. I just have to convince the customer to use ZFS replication. Hi Roshan, that's a great shame because if they actually want to make use of the features of ZFS such as replication, then they need to be serious about configuring their storage to play in the ZFS world and that means replication that ZFS knows about. Also, how does replication at the ZFS level use more storage - I'm assuming raw block - then at the array level? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs and EMC
This sounds familiarlike something about the powerpath device not responding to the SCSI inquiry strings. Are you using the same version of powerpath on both systems? Same type of array on both? Dominik Saar wrote: Hi there, have a strange behavior if i´ll create a zfs pool at an EMC PowerPath pseudo device. I can create a pool on emcpower0a but not on emcpower2a zpool core dumps with invalid argument Thats my second maschine with powerpath and zfs the first one works fine, even zfs/powerpath and failover ... Is there anybody who has the same failure and a solution ? :) Greets Dominik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] IRC: thought: irc.freenode.net #zfs for platform-agnostic or multi-platform discussion
Graham Perrin wrote: We have irc://irc.freenode.net/solaris and irc://irc.freenode.net/opensolaris and the other channels listed at http://blogs.sun.com/jimgris/entry/opensolaris_on_irc AND growing discussion of ZFS in Mac- 'FUSE- and Linux-oriented channels BUT unless I'm missing something, no IRC channel for ZFS. Please: * which IRC channel will be best for discussion of ZFS from a multi-platform or platform-agnostic viewpoint? #zfs though it looks like there aren't many people on at the momentand maybe someone had the same idea I did and just opened it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Toby Thain wrote: On 25-May-07, at 1:22 AM, Torrey McMahon wrote: Toby Thain wrote: On 22-May-07, at 11:01 AM, Louwtjie Burger wrote: On 5/22/07, Pål Baltzersen [EMAIL PROTECTED] wrote: What if your HW-RAID-controller dies? in say 2 years or more.. What will read your disks as a configured RAID? Do you know how to (re)configure the controller or restore the config without destroying your data? Do you know for sure that a spare-part and firmware will be identical, or at least compatible? How good is your service subscription? Maybe only scrapyards and museums will have what you had. =o Be careful when talking about RAID controllers in general. They are not created equal! ... Hardware raid controllers have done the job for many years ... Not quite the same job as ZFS, which offers integrity guarantees that RAID subsystems cannot. Depend on the guarantees. Some RAID systems have built in block checksumming. Which still isn't the same. Sigh. Yep.you get what you pay for. Funny how ZFS is free to purchase isn't it? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Toby Thain wrote: On 22-May-07, at 11:01 AM, Louwtjie Burger wrote: On 5/22/07, Pål Baltzersen [EMAIL PROTECTED] wrote: What if your HW-RAID-controller dies? in say 2 years or more.. What will read your disks as a configured RAID? Do you know how to (re)configure the controller or restore the config without destroying your data? Do you know for sure that a spare-part and firmware will be identical, or at least compatible? How good is your service subscription? Maybe only scrapyards and museums will have what you had. =o Be careful when talking about RAID controllers in general. They are not created equal! ... Hardware raid controllers have done the job for many years ... Not quite the same job as ZFS, which offers integrity guarantees that RAID subsystems cannot. Depend on the guarantees. Some RAID systems have built in block checksumming. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] No zfs_nocacheflush in Solaris 10?
Albert Chin wrote: On Thu, May 24, 2007 at 11:55:58AM -0700, Grant Kelly wrote: I'm getting really poor write performance with ZFS on a RAID5 volume (5 disks) from a storagetek 6140 array. I've searched the web and these forums and it seems that this zfs_nocacheflush option is the solution, but I'm open to others as well. What type of poor performance? Is it because of ZFS? You can test this by creating a RAID-5 volume on the 6140, creating a UFS file system on it, and then comparing performance with what you get against ZFS. If it's ZFS then you might want to check into modifying the 6540 NVRAM as mentioned in this thread http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024194.html there is a fix that doesn't involve modifying the NVRAM in the works. (I don't have an estimate.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
I did say depends on the guarantees, right? :-) My point is that all hw raid systems are not created equally. Nathan Kroenert wrote: Which has little benefit if it's the HBA or the Array internals change the meaning of the message... That's the whole point of ZFS's checksumming - It's end to end... Nathan. Torrey McMahon wrote: Toby Thain wrote: On 22-May-07, at 11:01 AM, Louwtjie Burger wrote: On 5/22/07, Pål Baltzersen [EMAIL PROTECTED] wrote: What if your HW-RAID-controller dies? in say 2 years or more.. What will read your disks as a configured RAID? Do you know how to (re)configure the controller or restore the config without destroying your data? Do you know for sure that a spare-part and firmware will be identical, or at least compatible? How good is your service subscription? Maybe only scrapyards and museums will have what you had. =o Be careful when talking about RAID controllers in general. They are not created equal! ... Hardware raid controllers have done the job for many years ... Not quite the same job as ZFS, which offers integrity guarantees that RAID subsystems cannot. Depend on the guarantees. Some RAID systems have built in block checksumming. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: AVS replication vs ZFS send recieve for odd sized volume pairs
John-Paul Drawneek wrote: Yes, i am also interested in this. We can't afford two super fast setup so we are looking at having a huge pile sata to act as a real time backup for all our streams. So what can AVS do and its limitations are? Would a just using zfs send and receive do or does AVS make it all seamless? Checkout http://www.opensolaris.org/os/project/avs/Demos/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Lots of overhead with ZFS - what am I doing wrong?
Jonathan Edwards wrote: On May 15, 2007, at 13:13, Jürgen Keil wrote: Would you mind also doing: ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=1 to see the raw performance of underlying hardware. This dd command is reading from the block device, which might cache dataand probably splits requests into maxphys pieces (which happens to be 56K on an x86 box). to increase this to say 8MB, add the following to /etc/system: set maxphys=0x80 and you'll probably want to increase sd_max_xfer_size as well (should be 256K on x86/x64) .. add the following to /kernel/drv/sd.conf: sd_max_xfer_size=0x80; then reboot to get the kernel and sd tunings to take. --- .je btw - the defaults on sparc: maxphys = 128K ssd_max_xfer_size = maxphys sd_max_xfer_size = maxphys Maybe we should file a bug to increase the max transfer request sizes? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS Support for remote mirroring
Anantha N. Srirama wrote: For whatever reason EMC notes (on PowerLink) suggest that ZFS is not supported on their arrays. If one is going to use a ZFS filesystem on top of a EMC array be warned about support issues. They should have fixed that in their matrices. It should say something like, EMC supports service LUNs to ZFS. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Support for remote mirroring
Matthew Ahrens wrote: Aaron Newcomb wrote: Does ZFS support any type of remote mirroring? It seems at present my only two options to achieve this would be Sun Cluster or Availability Suite. I thought that this functionality was in the works, but I haven't heard anything lately. You could put something together using iSCSI, or zfs send/recv. I think the definition of remote mirror is up for grabs here but in my mind remote mirror means the remote node has a always up to date copy of the primary data set modulo any transactions in flight. AVS, aka remote mirror, aka sndr, is usually used for this kind of work on the host. Storage arrays have things like, ahem, remote mirror, truecopy, srdf, etc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Support for remote mirroring
Aaron Newcomb wrote: Does ZFS support any type of remote mirroring? It seems at present my only two options to achieve this would be Sun Cluster or Availability Suite. I thought that this functionality was in the works, but I haven't heard anything lately. AVS is working today. (See Jim Dunham's frequent posts.) Are you looking for something tied directly into ZFS or ??? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS Support for remote mirroring
Aaron Newcomb wrote: Terry, Yes. AVS is pretty expensive. If ZFS did this out of the box it would be a huge differentiator. I know ZFS does snapshots today, but if we could extend this functionality to work across distance then we would have something that could compete with expensive solutions from EMC, HP, IBM, NetApp, etc. And to do it with open source software ... even better. AVS is already open-sourced. Not sure as to the free part but given the code is out there ... http://www.opensolaris.org/os/project/avs/ http://www.opensolaris.org/os/project/avs/files/ for the files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs boot image conversion kit is posted
Brian Hechinger wrote: On Fri, Apr 27, 2007 at 02:44:02PM -0700, Malachi de ??lfweald wrote: 2. ZFS mirroring can work without the metadb, but if you want the dump mirrored too, you need the metadb (I don't know if it needs to be mirrored, but I wanted both disks to be identical in case one died) I can't think of any real good reason you would need a mirrored dump device. The only place that would help you is if your main disk died between panic and next boot. ;) If you lose the primary drive, and your dump device points to the metadevice, then you wouldn't have to reset it. Also, most folks use the swap device for dumps. You wouldn't want to lose that on a live box. (Though honestly I've never just yanked the swap device and seen if the system keels over. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Boot: Dividing up the name space
Mike Dotson wrote: On Sat, 2007-04-28 at 17:48 +0100, Peter Tribble wrote: On 4/26/07, Lori Alt [EMAIL PROTECTED] wrote: Peter Tribble wrote: snip Why do administrators do 'df' commands? It's to find out how much space is used or available in a single file system. That made sense when file systems each had their own dedicated slice, but now it doesn't make that much sense anymore. Unless you've assigned a quota to a zfs file system, space available is meaningful more at the pool level. True, but it's actually quite hard to get at the moment. It's easy if you have a single pool - it doesn't matter which line you look at. But once you have 2 or more pools (and that's the way it would work, I expect - a boot pool and 1 or more data pools) there's an awful lot of output you may have to read. This isn't helped by zpool and zfs giving different answers., with the one from zfs being the one I want. The point is that every filesystem adds additional output the administrator has to mentally filter. (For one thing, you have to map a directory name to a containing pool.) It's actually quite easy and easier than the other alternatives (ufs, veritas, etc): # zfs list -rH -o name,used,available,refer rootdg And now it's setup to be parsed by a script (-H) since the output is tabbed. The -r says to recursively display children of the parent and the -o with the specified fields says to only display the fields specified. (output from one of my systems) blast(9): zfs list -rH -o name,used,available,refer rootdg rootdg 4.39G 44.1G 32K rootdg/nvx_wos_62 4.38G 44.1G 503M rootdg/nvx_wos_62/opt 793M44.1G 793M rootdg/nvx_wos_62/usr 3.01G 44.1G 3.01G rootdg/nvx_wos_62/var 113M44.1G 113M rootdg/swapvol 16K 44.1G 16K Even tho the mount point is setup as a legacy mount point, I know where each of them is mounted due to the vol name. And yes, this system has more than one pool: blast(10): zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT lpool 17.8G 11.4G 6.32G64% ONLINE - rootdg 49.2G 4.39G 44.9G 8% ONLINE - With zfs, file systems are in many ways more like directories than what we used to call file systems. They draw from pooled storage. They have low overhead and are easy to create and destroy. File systems are sort of like super-functional directories, with quality-of-service control and cloning and snapshots. Many of the things that sysadmins used to have to do with file systems just aren't necessary or even meaningful anymore. And so maybe the additional work of managing more file systems is actually a lot smaller than you might initially think. Oh, I agree. The trouble is that sysadmins still have to work using their traditional tools, including their brains, which are tooled up for cases with a much lower filesystem count. What I don't see as part of this are new tools (or enhancements to existing tools) that make this easier to handle. Not sure I agree with this. Many times, you end up dealing with multiple vxvol's and file systems. Anything over 12 filesystems and you're in overload (at least for me;) and I used my monitoring and scripting tools to filter that for me. Many of the systems I admin'd were setup quite differently based on use and functionality and disk size. Most of my tools were setup to take most of these into consideration and the fact that we ran almost every flavor of UNIX possible using the features of each OS as appropriate. Most of the tools will still work with zfs (if using df, etc) but it actually makes it easier once you have a monitoring issue - running out of space for example. Most tools have high and low water marks so when a file system gets too full, you get a warning. ZFS makes this much easier to admin as you can see which file system is being the hog and go directly to that file system and hunt instead of first finding the file system, hence the debate of the all-in-one / slice or breaking up to the major os fs's. Benefit of all-in-one / is you didn't have to guess at how much space you needed for each slice so you could upgrade, add optional software without needing to grow/shrink the OS. Drawback, if you filled up the file system, you had to hunt where it was filling up - /dev, /usr, /var/tmp, /var, / ??? Benefit of multiple slices was one fs didn't affect the others if you filled it up and you could find which was the problem fs very easily but if you estimated incorrectly, you had wasted disk space in one slice and not enough in another. ZFS gives you the benefit of both all-in-one and partitioned as it draws from a single pool of storage but also allows you to find which fs is being the problem and lock it down with quota's and reservations. For example, backup tools are currently filesystem based. And this changes the scenario how?
Re: [zfs-discuss] slow sync on zfs
Dickon Hood wrote: [snip] I'm currently playing with ZFS on a T2000 with 24x500GB SATA discs in an external array that presents as SCSI. After having much 'fun' with the Solaris SCSI driver not handling LUNs 2TB That should work if you have the latest KJP and friends. (Actually, it should have been working for a while so if not) What release are you on? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)
Marion Hakanson wrote: [EMAIL PROTECTED] said: We have been combing the message boards and it looks like there was a lot of talk about this interaction of zfs+nfs back in november and before but since i have not seen much. It seems the only fix up to that date was to disable zil, is that still the case? Did anyone ever get closure on this? There's a way to tell your 6120 to ignore ZFS cache flushes, until ZFS learns to do that itself. See: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024194.html The 6120 isn't the same as a 6130/61340/6540. The instructions referenced above won't work on a T3/T3+/6120/6320 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Testing of UFS, VxFS and ZFS
Anton B. Rang wrote: Second, VDBench is great for testing raw block i/o devices. I think a tool that does file system testing will get you better data. OTOH, shouldn't a tool that measures raw device performance be reasonable to reflect Oracle performance when configured for raw devices? I don't know the current best practice for Oracle, but a lot of DBAs still use raw devices instead of files for their table spaces Sure, once you charchterize what the performance of the oracle DB us. (Read% vs. Write%, i/o size, etc.) VDBench is great for testing the raw device with whatever workload you want to test. Most of the Oracle folks I talk to mention they use fs these days ... but that isn't scientific by any stretch. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshot features
Frank Cusack wrote: On April 16, 2007 10:24:04 AM +0200 Selim Daoud [EMAIL PROTECTED] wrote: hi all , when doing several zfs snapshot of a given fs, there are dependencies between snapshots that complexify the management of snapshots is there a plan to easy thes dependencies, so we can reach snapshot functionalities that are offered in other products suchs as Compellent (http://www.compellent.com/products/software/continuous_snapshots.aspx) Compellent software allows to set **retention periods** for different snapshots and will manage their migration or deletion automatically retention period is pretty easily managed via cron Yeah but cron isn't easily managed by anything. :-P ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Testing of UFS, VxFS and ZFS
Tony Galway wrote: I had previously undertaken a benchmark that pits “out of box” performance of UFS via SVM, VxFS and ZFS but was waylaid due to some outstanding availability issues in ZFS. These have been taken care of, and I am once again undertaking this challenge on behalf of my customer. The idea behind this benchmark is to show a. How ZFS might displace the current commercial volume and file system management applications being used. b. The learning curve of moving from current volume management products to ZFS. c. Performance differences across the different volume management products. VDBench is the test bed of choice as this has been accepted by the customer as a telling and accurate indicator of performance. The last time I attempted this test it had been suggested that VDBench is not appropriate to testing ZFS, I cannot see that being a problem, VDBench is a tool – if it highlights performance problems, then I would think it is a very effective tool so that we might better be able to fix those deficiencies. First, VDBench is a Sun internal and partner only tool so you might not get much response on this list. Second, VDBench is great for testing raw block i/o devices. I think a tool that does file system testing will get you better data. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor man's backup by attaching/detaching mirror drives on a _striped_ pool?
Frank Cusack wrote: On April 11, 2007 11:54:38 AM +0200 Constantin Gonzalez Schmitz [EMAIL PROTECTED] wrote: Hi Mark, Mark J Musante wrote: On Tue, 10 Apr 2007, Constantin Gonzalez wrote: Has anybody tried it yet with a striped mirror? What if the pool is composed out of two mirrors? Can I attach devices to both mirrors, let them resilver, then detach them and import the pool from those? You'd want to export them, not detach them. Detaching will overwrite the vdev labels and make it un-importable. thank you for the export/import idea, it does sound cleaner from a ZFS perspective, but comes at the expense of temporarily unmounting the filesystems. So, instead of detaching, would unplugging, then detaching work? I'm thinking something like this: - zpool create tank mirror dev1 dev2 dev3 - {physically move dev3 to new box} - zpool detach tank dev3 If we're talking about a 3rd device, added in order to migrate the data, why not just zfs send | zfs recv? Time? The reason people go the split mirror route, at least in block land, is because once you split the volume you can export it someplace else and start using it. Same goes for constant replication where you suspend the replication, take a copy, go start working on it, restart the replication. (Lots of ways people do that one.) I think the requirement could be voiced as, I want an independent copy of my data on a secondary system in a quick fashion. I want to avoid using resources from the primary system. The fun part is that people will think in terms of current technologies so you'll see split mirror, or volume copy or Truecopy mixed in for flavor. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Size taken by a zfs symlink
If I create a symlink inside a zfs file system and point the link to a file on a ufs file system on the same node how much space should I expect to see taken in the pool as used? Has this changed in the last few months? I know work is being done under 6516171 to make symlinks dittoable but I don't think that has gone back yet. (Has it?) Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: [storage-discuss] Detecting failed drive under MPxIO + ZFS
Robert Milkowski wrote: 2. MPxIO - it tries to failover disk to second SP but looks like it tries it forever (or very very long). After some time it should have generated disk IO failure... Are there any other hosts connected to this storage array? It looks like there might be an other host ping-ponging the LUNs with this box. 3. I guess that in such a case Eric's proposal probably won't help and the real problem is with MPxIO - right? WellI wouldn't say it's mpxio's fault either. At least not at this point. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Boot support for the x86 platform
Richard Elling wrote: Cyril Plisko wrote: First of all I'd like to congratulate the ZFS boot team with the integration of their work into ON. Great job ! I am sure there are plenty of people waiting anxiously for this putback. I'd also like to suggest that the material referenced by HEADS UP message [1] be made available to non-SWAN folks as well. [1] http://opensolaris.org/os/community/on/flag-days/pages/2007032801/ This has already occurred. http://www.opensolaris.org/os/community/on/flag-days/61-65/ maybe you were too quick on the trigger? :-) The case materials aren't there. Also, I think Cyril meant the instructions on fs.central mentioned in the flag-day notice. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs send speed
Howdy folks. I've a customer looking to use ZFS in a DR situation. They have a large data store where they will be taking snapshots every N minutes or so, sending the difference of the snapshot and previous snapshot with zfs send -i to a remote host, and in case of DR firing up the secondary. However, I've seen a few references to the speed of zfs send being, well, a bit slow. Anyone want to comment on the current speed of zfs send? Any recent changes or issues found in this area? Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send speed
Matthew Ahrens wrote: Torrey McMahon wrote: Howdy folks. I've a customer looking to use ZFS in a DR situation. They have a large data store where they will be taking snapshots every N minutes or so, sending the difference of the snapshot and previous snapshot with zfs send -i to a remote host, and in case of DR firing up the secondary. Cool! I sure hope so. ;-) However, I've seen a few references to the speed of zfs send being, well, a bit slow. Anyone want to comment on the current speed of zfs send? Any recent changes or issues found in this area? What bits are you running? I made some recent improvements (6490104, fixed in build 53, targeted for s10u4). There are still a few issues, but by and large, performance should be very good. Can you describe what problem you're experiencing? How much data, how many files, how big of a stream, what transport, how long it takes, are you seeing lots of CPU or disk activity on the sending or receiving side when it's slow? I'm only doing an initial investigation now so I have no test data at this point. The reason I asked, and I should have tacked this on at the end of the last email, was a blog entry that stated zfs send was slow http://www.lethargy.org/~jesus/archives/80-ZFS-send-trickle..html Looking back through the discuss archives I didn't see anything else mentioned but some others mentioned it to me off line as well. It could be we all read the same blog entry so I figured I'd ask if anyone had seen such behavior recently. Hopefully, I can get a test bed setup fairly quickly and see how it works myself. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS party - PANIC collection
Gino Ruopolo wrote: Conclusion: After a day of tests we are going to think that ZFS doesn't work well with MPXIO. What kind of array is this? If it is not a Sun array then how are you configuring mpxio to recognize the array? We are facing the same problems with a JBOD (EMC DAE2), a Storageworks EVA and an old Storageworks EMA. What makes you think that these arrays work with mpxio? Every array does not automatically work. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Google paper on disk reliability
Richard Elling wrote: Akhilesh Mritunjai wrote: I believe that the word would have gone around already, Google engineers have published a paper on disk reliability. It might supplement the ZFS FMA integration and well - all the numerous debates on spares etc etc over here. Good paper. They validate the old saying, complex systems fail in complex ways. We've also done some internal (Sun) studies which cast doubt on the ability of SMART to predict failures. which is why we were never really fans of turning it on. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS with SAN Disks and mutipathing
Richard Elling wrote: JS wrote: I'm using ZFS on both EMC and Pillar arrays with PowerPath and MPxIO, respectively. Both work fine - the only caveat is to drop your sd_queue to around 20 or so, otherwise you can run into an ugly display of bus resets. This is sd_max_throttle or ssd_max_throttle. The problem is that the host can easily overrun the storage for slow storage devices. This will reduce the load on the storage device. Consult the storage configuration guidelines for recommended values (default = 256 outstanding commands, in the old days EMC recommended 20). Yes, we'd all like this problem to go away. An other note: This drops the queue size for all devices that use the sd or ssd driver. I'm still not sure why EMC/HDS/Pillar boxes can't send a queue full response back when they start to fill up like other storage arrays do. It gets even worse when you have to do the HDS Math to set all your hosts to some low queue size. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is ZFS file system supports short writes ?
Robert Milkowski wrote: Hello dudekula, Thursday, February 15, 2007, 11:08:26 AM, you wrote: Hi all, Please let me know the ZFS support for short writes ? And what are short writes? http://www.pittstate.edu/wac/newwlassignments.html#ShortWrites :-P ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] number of lun's that zfs can handle
Claus Guttesen wrote: Our main storage is a HDS 9585V Thunder with vxfs and raid5 on 400 GB sata disk handled by the storage system. If I would migrate to zfs that would mean 390 jbod's. How so? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper Origins Q
Richard Elling wrote: One of the benefits of ZFS is that not only is head synchronization not needed, but also block offsets do not have to be the same. For example, in a traditional mirror, block 1 on device 1 is paired with block 1 on device 2. In ZFS, this 1:1 mapping is not required. I believe this will result in ZFS being more resilient to disks with multiple block failures. In order for a traditional RAID to implement this, it would basically need to [re]invent a file system. We had this fixed in T3 land awhile ago so I think most storage arrays don't do the 1:1 mapping anymore. It's striped down the drives. In theory, you could lose more then one drive in a T3 mirror and still maintain data in certain situations. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
Dale Ghent wrote: Yeah sure it might eat into STK profits, but one will still have to go there for redundant controllers. Repeat after me: There is no STK. There is only Sun. 8-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hot spares - in standby?
Richard Elling wrote: Good question. If you consider that mechanical wear out is what ultimately causes many failure modes, then the argument can be made that a spun down disk should last longer. The problem is that there are failure modes which are triggered by a spin up. I've never seen field data showing the difference between the two. Often, the spare is up and running but for whatever reason you'll have a bad block on it and you'll die during the reconstruct. Periodically checking the spare means reading and writing from over time in order to make sure it's still ok. (You take the spare out of the trunk, you look at it, you check the tire pressure, etc.) The issue I see coming down the road is that we'll start getting into a Golden Gate paint job where it takes so long to check the spare that we'll just keep the process going constantly. Not as much wear and tear as real i/o but it will still be up and running the entire time and you won't be able to spin the spare down. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
Marion Hakanson wrote: However, given the default behavior of ZFS (as of Solaris-10U3) is to panic/halt when it encounters a corrupted block that it can't repair, I'm re-thinking our options, weighing against the possibility of a significant downtime caused by a single-block corruption. Guess what happens when UFS finds an inconsistency it can't fix either? The issue is that ZFS has the chance to fix the inconsistency if the zpool is a mirror or raidZ. Not that it finds the inconsistency in the first place. ZFS will just find more of them given a set of errors vs other filesystems. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Project Proposal: Availability Suite
Nicolas Williams wrote: On Fri, Jan 26, 2007 at 05:15:28PM -0700, Jason J. W. Williams wrote: Could the replication engine eventually be integrated more tightly with ZFS? That would be slick alternative to send/recv. But a continuous zfs send/recv would be cool too. In fact, I think ZFS tightly integrated with SNDR wouldn't be that much different from a continuous zfs send/recv. Even better with snapshots, and scoreboarding, and synch vs asynch and and and and . ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss