Re: [zfs-discuss] reliable, enterprise worthy JBODs?
Rocky, Does DataON manufacture these units or they LSI OEM? -marc Sent from my iPhone 416.414.6271 On 2011-01-25, at 2:53 PM, Rocky Shek roc...@dataonstorage.com wrote: Philip, You can consider DataON DNS-1600 4U 24Bay 6Gb/s SAS JBOD Storage. http://dataonstorage.com/dataon-products/dns-1600-4u-6g-sas-to-sas-sata-jbod -storage.html It is the best fit for ZFS Storage application. It can be a good replacement of Sun/Oracle J4400 and J4200 There are also Ultra density DNS-1660 4U 60 Bay 6Gb/s SAS JBOD Storage and other form factor JBOD. http://dataonstorage.com/dataon-products/6g-sas-jbod/dns-1660-4u-60-bay-6g-3 5inch-sassata-jbod.html Rocky -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Philip Brown Sent: Tuesday, January 25, 2011 10:05 AM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] reliable, enterprise worthy JBODs? So, another hardware question :) ZFS has been touted as taking maximal advantage of disk hardware, to the point where it can be used efficiently and cost-effectively on JBODs, rather than having to throw more expensive RAID arrays at it. Only trouble is.. JBODs seem to have disappeared :( Sun/Oracle has discontinued its j4000 line, with no replacement that I can see. IBM seems to have some nice looking hardware in the form of its EXP3500 expansion trays... but they only support it connected to an IBM (SAS) controller... which is only supported when plugged into IBM server hardware :( Any other suggestions for (large-)enterprise-grade, supported JBOD hardware for ZFS these days? Either fibre or SAS would be okay. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?
That's a great deck, Chris. -marc Sent from my iPhone On 2010-11-27, at 10:34 AM, Christopher George cgeo...@ddrdrive.com wrote: I haven't had a chance to test a Vertex 2 PRO against my 2 EX, and I'd be interested if anyone else has. I recently presented at the OpenStorage Summit 2010 and compared exactly the three devices you mention in your post (Vertex 2 EX, Vertex 2 Pro, and the DDRdrive X1) as ZIL Accelerators. Jump to slide 37 for the write IOPS benchmarks: http://www.ddrdrive.com/zil_accelerator.pdf and you *really* want to make sure you get the 4k alignment right Excellent point, starting on slide 66 the performance impact of partition misalignment is illustrated. Considering the results, longevity might be an even greater concern than decreased IOPS performance as ZIL acceleration is a worst case scenario for a Flash based SSD. The DDRdrive is still the way to go for the ultimate ZIL accelleration, but it's pricey as hell. In addition to product cost, I believe IOPS/$ is a relevant point of comparison. Google products gives the price range for the OCZ 50GB SSDs: Vertex 2 EX (OCZSSD2-2VTXEX50G: $870 - $1,011 USD) Vertex 2 Pro (OCZSSD2-2VTXP50G: $399 - $525 USD) 4KB Sustained and Aligned Mixed Write IOPS results (See pdf above): Vertex 2 EX (6325 IOPS) Vertex 2 Pro (3252 IOPS) DDRdrive X1 (38701 IOPS) Using the lowest online price for both the Vertex 2 EX and Vertex 2 Pro, and the full list price (SRP) of the DDRdrive X1. IOPS/Dollar($): Vertex 2 EX (6325 IOPS / $870) = 7.27 Vertex 2 Pro (3252 IOPS / $399) = 8.15 DDRdrive X1 (38701 IOPS / $1,995) = 19.40 Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ideal SATA/SAS Controllers for ZFS
Nice write-up, Marc. Aren't the SuperMicro cards their funny UIO form factor? Wouldn't want someone buying a card that won't work in a standard chassis. -marc On Tue, May 18, 2010 at 2:26 AM, Marc Bevand m.bev...@gmail.com wrote: The LSI SAS1064E slipped through the cracks when I built the list. This is a 4-port PCIe x8 HBA with very good Solaris (and Linux) support. I don't remember having seen it mentionned on zfs-discuss@ before, even though many were looking for 4-port controllers. Perhaps the fact it is priced too close to 8-port models explains why it is relatively unnoted. That said, the wide x8 PCIe link makes it the *cheapest* controller able to feed 300-350MB/s to at least 4 ports concurrently. Now added to my list. -mrb ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
Hi Michael, What makes you think striping the SSDs would be faster than round-robin? -marc On Thu, May 6, 2010 at 1:09 PM, Michael Sullivan michael.p.sulli...@mac.com wrote: Everyone, Thanks for the help. I really appreciate it. Well, I actually walked through the source code with an associate today and we found out how things work by looking at the code. It appears that L2ARC is just assigned in round-robin fashion. If a device goes offline, then it goes to the next and marks that one as offline. The failure to retrieve the requested object is treated like a cache miss and everything goes along its merry way, as far as we can tell. I would have hoped it to be different in some way. Like if the L2ARC was striped for performance reasons, that would be really cool and using that device as an extension of the VM model it is modeled after. Which would mean using the L2ARC as an extension of the virtual address space and striping it to make it more efficient. Way cool. If it took out the bad device and reconfigured the stripe device, that would be even way cooler. Replacing it with a hot spare more cool too. However, it appears from the source code that the L2ARC is just a (sort of) jumbled collection of ZFS objects. Yes, it gives you better performance if you have it, but it doesn't really use it in a way you might expect something as cool as ZFS does. I understand why it is read only, and it invalidates it's cache when a write occurs, to be expected for any object written. If an object is not there because of a failure or because it has been removed from the cache, it is treated as a cache miss, all well and good - go fetch from the pool. I also understand why the ZIL is important and that it should be mirrored if it is to be on a separate device. Though I'm wondering how it is handled internally when there is a failure of one of it's default devices, but then again, it's on a regular pool and should be redundant enough, only just some degradation in speed. Breaking these devices out from their default locations is great for performance, and I understand. I just wish the knowledge of how they work and their internal mechanisms were not so much of a black box. Maybe that is due to the speed at which ZFS is progressing and the features it adds with each subsequent release. Overall, I am very impressed with ZFS, its flexibility and even more so, it's breaking all the rules about how storage should be managed and I really like it. I have yet to see anything to come close in its approach to disk data management. Let's just hope it keeps moving forward, it is truly a unique way to view disk storage. Anyway, sorry for the ramble, but to everyone, thanks again for the answers. Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 On 7 May 2010, at 00:00 , Robert Milkowski wrote: On 06/05/2010 15:31, Tomas Ögren wrote: On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes: On Wed, 5 May 2010, Edward Ned Harvey wrote: In the L2ARC (cache) there is no ability to mirror, because cache device removal has always been supported. You can't mirror a cache device, because you don't need it. How do you know that I don't need it? The ability seems useful to me. The gain is quite minimal.. If the first device fails (which doesn't happen too often I hope), then it will be read from the normal pool once and then stored in ARC/L2ARC again. It just behaves like a cache miss for that specific block... If this happens often enough to become a performance problem, then you should throw away that L2ARC device because it's broken beyond usability. Well if a L2ARC device fails there might be an unacceptable drop in delivered performance. If it were mirrored than a drop usually would be much smaller or there could be no drop if a mirror had an option to read only from one side. Being able to mirror L2ARC might especially be useful once a persistent L2ARC is implemented as after a node restart or a resource failover in a cluster L2ARC will be kept warm. Then the only thing which might affect L2 performance considerably would be a L2ARC device failure... -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
The L2ARC will continue to function. -marc On 5/4/10, Michael Sullivan michael.p.sulli...@mac.com wrote: HI, I have a question I cannot seem to find an answer to. I know I can set up a stripe of L2ARC SSD's with say, 4 SSD's. I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will be relocated back to the spool. I'd probably have it mirrored anyway, just in case. However you cannot mirror the L2ARC, so... What I want to know, is what happens if one of those SSD's goes bad? What happens to the L2ARC? Is it just taken offline, or will it continue to perform even with one drive missing? Sorry, if these questions have been asked before, but I cannot seem to find an answer. Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sent from my mobile device ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS where to go!
Richard, My challenge to you is that at least three vedors that I know of built their storage platforms on FreeBSD. One of them sells $4bn/year of product - petty sure that eclipses all (Open)Solaris-based storage ;) -marc On 3/26/10, Richard Elling richard.ell...@gmail.com wrote: On Mar 26, 2010, at 4:46 AM, Edward Ned Harvey wrote: What does everyone thing about that? I bet it is not as mature as on OpenSolaris. mature is not the right term in this case. FreeBSD has been around much longer than opensolaris, and it's equally if not more mature. Bill Joy might take offense to this statement. Both FreeBSD and Solaris trace their roots to the work done at Berkeley 30 years ago. Both have evolved in different ways at different rates. Since Solaris targets the enterprise market, I will claim that Solaris is proven in that space. OpenSolaris is just one of the next steps forward for Solaris. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sent from my mobile device ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks
On Thu, Mar 18, 2010 at 2:44 PM, Chris Murray chrismurra...@gmail.comwrote: Good evening, I understand that NTFS VMDK do not relate to Solaris or ZFS, but I was wondering if anyone has any experience of checking the alignment of data blocks through that stack? NetApp has a great little tool called mbrscan/mbralignit's free, but I'm not sure if NetApp customers are supposed to distribute it. -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can we get some documentation on iSCSI sharing after comstar took over?
On Tue, Mar 16, 2010 at 2:46 PM, Svein Skogen sv...@stillbilde.net wrote: Not quite a one liner. After you create the target once (step 3), you do not have to do that again for the next volume. So three lines. So ... no way around messing with guid numbers? I'll write you a Perl script :) -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can we get some documentation on iSCSI sharing after comstar took over?
On Tue, Mar 16, 2010 at 3:16 PM, Svein Skogen sv...@stillbilde.net wrote: I'll write you a Perl script :) I think there are ... several people that'd like a script that gave us back some of the ease of the old shareiscsi one-off, instead of having to spend time on copy-and-pasting GUIDs they have ... no real use for. ;) I'll try and knock something up in the next few days, then! -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] [OT] Interesting ranking of MLC SSDs
Given that quote a few folk ask which is the best SSD?, I thought some folk might find the following interesting: http://www.storagenewsletter.com/news/flash/dramexchange-intel-ssds -marc P.S: Apologies if the slightly off-topic post offends anyone. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommendations for an l2arc device?
On Fri, Feb 26, 2010 at 2:43 PM, Brandon High bh...@freaks.com wrote: snip The drives I'm considering are: OCZ Vertex 30GB Intel X25V 40GB Crucial CT64M225 64GB Personally, I'd go with the Intel product...but save a few more pennies up and get the X-25M. The extra boost on read and write performance is worth it. -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Freeing unused space in thin provisioned zvols
On Fri, Feb 26, 2010 at 2:42 PM, Lutz Schumann presa...@storageconcepts.dewrote: Now If a virtual machine writes to the zvol, blocks are allocated on disk. Reads are now partial from disk (for all blocks written) and from ZFS layer (all unwritten blocks). If the virtual machine (which may be vmware / xen / hyperv) deletes blocks / frees space within the zvol, this also means a write - usually in meta data area only. Thus the underlaying Storage system does not know which blocks in a zvol are really used. Your're using VMs and *not* using dedupe?! VMs are almost the perfect use-case for dedupe :) -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [indiana-discuss] future of OpenSolaris
On Wed, Feb 24, 2010 at 2:02 PM, Troy Campbell troy.campb...@fedex.comwrote: http://www.oracle.com/technology/community/sun-oracle-community-continuity.html Half way down it says: Will Oracle support Java and OpenSolaris User Groups, as Sun has? Yes, Oracle will indeed enthusiastically support the Java User Groups, OpenSolaris User Groups, and other Sun-related user group communities (including the Java Champions), just as Oracle actively supports hundreds of product-oriented user groups today. We will be reaching out to these groups soon. Supporting doesn't necessarily mean continuing the Open Source projects! -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris 2010.03 / snv releases
Isn't the dedupe bug fixed in svn133? -marc On Tue, Feb 23, 2010 at 9:21 AM, Jeffry Molanus jeffry.mola...@proact.nlwrote: There is no clustering package for it and available source seems very old also the de-dup bug is there iirc. So if you don't need HA cluster and dedup.. BR, Jeffry -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bruno Sousa Sent: dinsdag 23 februari 2010 8:37 To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Opensolaris 2010.03 / snv releases Hi all, According to what i have been reading the opensolaris 2010.03 should be released around March this year, but with all the process of the Oracle/Sun deal i was wondering if anyone knows if this schedule still makes sense, and if not does snv_132/133 look very similar to future 2010.03. In other words, without waiting for the opensolaris 2010.03 would anyone risk to put in production any box with snv_132/133 ? Thanks, Bruno ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Import zpool from FreeBSD in OpenSolaris
send and receive?! -marc On Tue, Feb 23, 2010 at 9:25 PM, Thomas Burgess wonsl...@gmail.com wrote: When i needed to do this, the only way i could get it to work was to do this: Take some disks, use a Opensolaris Live CD and label them EFI Create a ZPOOL in FreeBSD with these disks copy my data from freebsd to the new zpool export the pool import the pool On Tue, Feb 23, 2010 at 9:11 PM, patrik s...@dentarg.net wrote: I want to import my zpool's from FreeBSD 8.0 in OpenSolaris 2009.06. After reading the few posts (links below) I was able to find on the subject, it seems like it there is a differences between FreeBSD and Solaris. FreeBSD operates on directly on the disk and Solaris creates a partion and uses that... is that right? Is it impossible for OpenSolaris to use zpool's from FreeBSD? * http://opensolaris.org/jive/thread.jspa?messageID=445766 * http://opensolaris.org/jive/thread.jspa?messageID=450755; * http://mail.opensolaris.org/pipermail/ug-nzosug/2009-June/27.html This is zpool import from my machine with OpenSolaris 2009.06 (all zpool's are fine in FreeBSD). Notice that the zpool named temp can be imported. Why not secure then? Is it because it is raidz1? pool: secure id: 15384175022505637073 state: UNAVAIL status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-5E config: secureUNAVAIL insufficient replicas raidz1 UNAVAIL insufficient replicas c8t1d0p0 ONLINE c8t2d0s2 ONLINE c8t3d0s8 UNAVAIL corrupted data c8t4d0s8 UNAVAIL corrupted data pool: temp id: 10889808377251842082 state: ONLINE status: The pool is formatted using an older on-disk version. action: The pool can be imported using its name or numeric identifier, though some features will not be available without an explicit 'zpool upgrade'. config: tempONLINE c8t0d0p0 ONLINE -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance
On Thu, Feb 18, 2010 at 10:49 AM, Matt registrat...@flash.shanje.comwrote: Here's IOStat while doing writes : r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.0 256.93.0 2242.9 0.3 0.11.30.5 11 12 c0t0d0 0.0 253.90.0 2242.9 0.3 0.11.00.4 10 11 c0t1d0 1.0 253.92.5 2234.4 0.2 0.10.90.4 9 11 c1t0d0 1.0 258.92.5 2228.9 0.3 0.11.30.5 12 13 c1t1d0 This shows about a 10-12% utilization of my gigabit network, as reported by Task Manager in Windows 7. Unless you are using SSDs (which I believe you're not), you're IOPS-bound on the drives IMHO. Writes are a better test of this than reads for cache reasons. -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance
Run Bonnie++. You can install it with the Sun package manger and it'll appear under /usr/benchmarks/bonnie++ Look for the command line I posted a couple of days back for a decent set of flags to truly rate performance (using sync writes). -marc On Thu, Feb 18, 2010 at 11:05 AM, Matt registrat...@flash.shanje.comwrote: Also - still looking for the best way to test local performance - I'd love to make sure that the volume is actually able to perform at a level locally to saturate gigabit. If it can't do it internally, why should I expect it to work over GbE? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Bonnie++ stats
Anyone else got stats to share? Note: the below is 4*Caviar Black 500GB drives, 1*Intel x-25m setup as both ZIL and L2ARC, decent ASUS mobo, 2GB of fast RAM. -marc r...@opensolaris130:/tank/myfs# /usr/benchmarks/bonnie++/bonnie++ -u root -d /tank/myfs -f -b Using uid:0, gid:0. Writing intelligently...done Rewriting...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.03c --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP opensolaris130 4G 49503 13 30468 9 67882 6 320.1 1 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 4225 30 + +++ 4709 24 3407 38 + +++ 4572 22 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance
Definitely use Comstar as Tim says. At home I'm using 4*WD Caviar Blacks on an AMD Phenom x4 @ 1.Ghz and only 2GB of RAM. I'm running svn132. No HBA - onboard SB700 SATA ports.$ I can, with IOmeter, saturate GigE from my WinXP laptop via iSCSI. Can you toss the RAID controller aside an use motherboard SATA ports with just a few drives? That could help highlight if its the RAID controler or not, and even one drive has better throughput than you're seeing. Cache, ZIL, and vdev tweaks are great - but you're not seeing any of those bottlnecks, I can assure you. -marc On 2/10/10, Tim Cook t...@cook.ms wrote: On Wed, Feb 10, 2010 at 4:06 PM, Brian E. Imhoff beimh...@hotmail.comwrote: I am in the proof-of-concept phase of building a large ZFS/Solaris based SAN box, and am experiencing absolutely poor / unusable performance. Where to begin... The Hardware setup: Supermicro 4U 24 Drive Bay Chassis Supermicro X8DT3 Server Motherboard 2x Xeon E5520 Nehalem 2.26 Quad Core CPUs 4GB Memory Intel EXPI9404PT 4 port 1000GB Server Network Card (used for ISCSI traffic only) Adaptec 52445 28 Port SATA/SAS Raid Controller connected to 24x Western Digital WD1002FBYS 1TB Enterprise drives. I have configured the 24 drives as single simple volumes in the Adeptec RAID BIOS , and are presenting them to the OS as such. I then, Create a zpool, using raidz2, using all 24 drives, 1 as a hotspare: zpool create tank raidz2 c1t0d0 c1t1d0 [] c1t22d0 spare c1t23d00 Then create a volume store: zfs create -o canmount=off tank/volumes Then create a 10 TB volume to be presented to our file server: zfs create -V 10TB -o shareiscsi=on tank/volumes/fsrv1data From here, I discover the iscsi target on our Windows server 2008 R2 File server, and see the disk is attached in Disk Management. I initialize the 10TB disk fine, and begin to quick format it. Here is where I begin to see the poor performance issue. The Quick Format took about 45 minutes. And once the disk is fully mounted, I get maybe 2-5 MB/s average to this disk. I have no clue what I could be doing wrong. To my knowledge, I followed the documentation for setting this up correctly, though I have not looked at any tuning guides beyond the first line saying you shouldn't need to do any of this as the people who picked these defaults know more about it then you. Jumbo Frames are enabled on both sides of the iscsi path, as well as on the switch, and rx/tx buffers increased to 2048 on both sides as well. I know this is not a hardware / iscsi network issue. As another test, I installed Openfiler in a similar configuration (using hardware raid) on this box, and was getting 350-450 MB/S from our fileserver, An iostat -xndz 1 readout of the %b% coloum during a file copy to the LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2 seconds of 100, and repeats. Is there anything I need to do to get this usable? Or any additional information I can provide to help solve this problem? As nice as Openfiler is, it doesn't have ZFS, which is necessary to achieve our final goal. You're extremely light on ram for a system with 24TB of storage and two E5520's. I don't think it's the entire source of your issue, but I'd strongly suggest considering doubling what you have as a starting point. What version of opensolaris are you using? Have you considered using COMSTAR as your iSCSI target? --Tim -- Sent from my mobile device ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance
How does lowering the flush interval help? If he can't ingress data fast enough, faster flushing is a Bad Thibg(tm). -marc On 2/10/10, Kjetil Torgrim Homme kjeti...@linpro.no wrote: Bob Friesenhahn bfrie...@simple.dallas.tx.us writes: On Wed, 10 Feb 2010, Frank Cusack wrote: The other three commonly mentioned issues are: - Disable the naggle algorithm on the windows clients. for iSCSI? shouldn't be necessary. - Set the volume block size so that it matches the client filesystem block size (default is 128K!). default for a zvol is 8 KiB. - Check for an abnormally slow disk drive using 'iostat -xe'. his problem is lazy ZFS, notice how it gathers up data for 15 seconds before flushing the data to disk. tweaking the flush interval down might help. An iostat -xndz 1 readout of the %b% coloum during a file copy to the LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2 seconds of 100, and repeats. what are the other values? ie., number of ops and actual amount of data read/written. -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sent from my mobile device ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance
This is a Windows box, not a DB that flushes every write. The drives are capable of over 2000 IOPS (albeit with high latency as its NCQ that gets you there) which would mean, even with sync flushes, 8-9MB/sec. -marc On 2/10/10, Brent Jones br...@servuhome.net wrote: On Wed, Feb 10, 2010 at 3:12 PM, Marc Nicholas geekyth...@gmail.com wrote: How does lowering the flush interval help? If he can't ingress data fast enough, faster flushing is a Bad Thibg(tm). -marc On 2/10/10, Kjetil Torgrim Homme kjeti...@linpro.no wrote: Bob Friesenhahn bfrie...@simple.dallas.tx.us writes: On Wed, 10 Feb 2010, Frank Cusack wrote: The other three commonly mentioned issues are: - Disable the naggle algorithm on the windows clients. for iSCSI? shouldn't be necessary. - Set the volume block size so that it matches the client filesystem block size (default is 128K!). default for a zvol is 8 KiB. - Check for an abnormally slow disk drive using 'iostat -xe'. his problem is lazy ZFS, notice how it gathers up data for 15 seconds before flushing the data to disk. tweaking the flush interval down might help. An iostat -xndz 1 readout of the %b% coloum during a file copy to the LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2 seconds of 100, and repeats. what are the other values? ie., number of ops and actual amount of data read/written. -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sent from my mobile device ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ZIL performance issues? Is writecache enabled on the LUNs? -- Brent Jones br...@servuhome.net -- Sent from my mobile device ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
I think you'll do just fine then. And I think the extra platter will work to your advantage. -marc On 2/3/10, Simon Breden sbre...@gmail.com wrote: Probably 6 in a RAID-Z2 vdev. Cheers, Simon -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sent from my mobile device ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
I would go with cores (threads) rather than clock speed here. My home system is a 4-core AMD @ 1.8Ghz and performs well. I wouldn't use drives that big and you should be aware of the overheads of RaidZ[x]. -marc On Thu, Feb 4, 2010 at 6:19 PM, Brian broco...@vt.edu wrote: I am Starting to put together a home NAS server that will have the following roles: (1) Store TV recordings from SageTV over either iSCSI or CIFS. Up to 4 or 5 HD streams at a time. These will be streamed live to the NAS box during recording. (2) Playback TV (could be stream being recorded, could be others) to 3 or more extenders (3) Hold a music repository (4) Hold backups from windows machines, mac (time machine), linux. (5) Be an iSCSI target for several different Virtual Boxes. Function 4 will use compression and deduplication. Function 5 will use deduplication. I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2 mirrored boot drives. I have been reading these forums off and on for about 6 months trying to figure out how to best piece together this system. I am first trying to select the CPU. I am leaning towards AMD because of ECC support and power consumption. For items such as de-dupliciation, compression, checksums etc. Is it better to get a faster clock speed or should I consider more cores? I know certain functions such as compression may run on multiple cores. I have so far narrowed it down to: AMD Phenom II X2 550 Black Edition Callisto 3.1GHz and AMD Phenom X4 9150e Agena 1.8GHz Socket AM2+ 65W Quad-Core As they are roughly the same price. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
Very interesting stats -- thanks for taking the time and trouble to share them! One thing I found interesting is that the Gen 2 X25-M has higher write IOPS than the X25-E according to Intel's documentation (6,600 IOPS for 4K writes versus 3,300 IOPS for 4K writes on the E). I wonder if it'd perform better as a ZIL? (The write latency on both drives is the same). -marc On Thu, Feb 4, 2010 at 6:43 PM, Peter Radig pe...@radig.de wrote: I was interested in the impact the type of an SSD has on the performance of the ZIL. So I did some benchmarking and just want to share the results. My test case is simply untarring the latest ON source (528 MB, 53k files) on an Linux system that has a ZFS file system mounted via NFS over gigabit ethernet. I got the following results: - locally on the Solaris box: 30 sec - remotely with no dedicated ZIL device: 36 min 37 sec (factor 73 compared to local) - remotely with ZIL disabled: 1 min 54 sec (factor 3.8 compared to local) - remotely with a OCZ VERTEX SATA II 120 GB as ZIL device: 14 min 40 sec (factor 29.3 compared to local) - remotely with an Intel X25-E 32 GB as ZIL device: 3 min 11 sec (factor 6.4 compared to local) So it really makes a difference what type of SSD you use for your ZIL device. I was expecting a good performance from the X25-E, but was really suprised that it is that good (only 1.7 times slower than it takes with ZIL completely disabled). So I will use the X25-E as ZIL device on my box and will not consider disabling ZIL at all to improve NFS performance. -- Peter -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
On Thu, Feb 4, 2010 at 7:54 PM, Brian broco...@vt.edu wrote: It sounds like the consensus is more cores over clock speed. Surprising to me since the difference in clocks speed was over 1Ghz. So, I will go with a quad core. Four cores @ 1.8Ghz = 7.2Ghz of threaded performance ([Open]Solaris is relatively decent in terms of threading). Two cores @ 3.1Ghz = 6.2Ghz :) Although you may find single threaded operations slower, as someone pointed out, but even those might wash out as sometimes its I/O that's the problem. I was leaning towards 4GB of ram - which hopefully should be enough for dedup as I am only planning on dedupping my smaller file systems (backups and VMs) 4GB is a good start. Was my raidz2 performance comment above correct? That the write speed is that of the slowest disk? That is what I believe I have read. You are sort-of-correct that its the write speed of the slowest disk. Mirrored drives will be faster, especially for random I/O. But you sacrifice storage for that performance boost. That said, I have a similar setup as far as number of spindles and can push 200MB/sec+ through it and saturate GigE for iSCSI so maybe I'm being harsh on raidz2 :) Now on to the hard part of picking a motherboard that is supported and has enough SATA ports! I used an ASUS board (M4A785-M) which has six (6) SATA2 ports onboard and pretty decent Hypertransport throughput. Hope that helps. -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Thu, Feb 4, 2010 at 10:18 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Thu, 4 Feb 2010, Marc Nicholas wrote: Very interesting stats -- thanks for taking the time and trouble to share them! One thing I found interesting is that the Gen 2 X25-M has higher write IOPS than the X25-E according to Intel's documentation (6,600 IOPS for 4K writes versus 3,300 IOPS for 4K writes on the E). I wonder if it'd perform better as a ZIL? (The write latency on both drives is the same). The write IOPS between the X25-M and the X25-E are different since with the X25-M, much more of your data gets completely lost. Most of us prefer not to lose our data. Would you like to qualify your statement further? While I understand the difference between MLC and SLC parts, I'm pretty sure Intel didn't design the M version to make data get completely lost. ;) -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance
On Thu, Feb 4, 2010 at 10:35 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Thu, 4 Feb 2010, Marc Nicholas wrote: The write IOPS between the X25-M and the X25-E are different since with the X25-M, much more of your data gets completely lost. Most of us prefer not to lose our data. Would you like to qualify your statement further? Google is your friend. And check earlier on this list/forum as well. While I understand the difference between MLC and SLC parts, I'm pretty sure Intel didn't design the M version to make data get completely lost. ;) It loses the most recently written data, even after a cache sync request. A number of people have verified this for themselves and posted results. Even the X25-E has been shown to lose some transactions. The devices have some DRAM (16MB) that is used for write amplification levelling. The sudden loss of power means that this DRAM doesn't get flushed to Flash. This is the very reason the STEC devices have a supercap. -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
As I previously mentioned, I'm pretty happy with the 500GB Caviar Blacks that I have :) One word of caution: failure and rebuild times with 1TB+ drives can be a concern. How many spindles were you planning? -marc On 2/3/10, Simon Breden sbre...@gmail.com wrote: Sounds good. I was taking a look at the 1TB Caviar Black drives which are WD1001FALS I think. They seem to have superb user ratings and good reliability comments from many people. I consider these full fat drives as opposed to the LITE (green) drives, as they spin at 7200 rpm instead of 5400 rpm, have higher performance and burn more juice than the Green models, but they have superb reviews from almost everyone regarding behaviour and reliability, and at the end of the day, we need good, reliable drives that work well in a RAID system. I can get them for around the same price as the cheapest 1.5TB green drives from Samsung. Somewhere I saw people saying that WDTLER.EXE works to allow reduction of the error reporting time like the enterprise RE versions (RAID Edition). However I then saw another user saying on the newer revisions WD have disabled this. I need to check a bit more to see what's really the case. Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sent from my mobile device ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives
I agree wholeheartedlyyou're paying to make the problem go away in an expedient manner. That said, I see how much we spend on NetApp storage at work and it makes me shudder ;) I think someone was wondering if the large storage vendors have their own microcode on drives? I can tell you that NetApp do...and that's one way they lock you in (if the drive doesn't report NetApp firmware, the filer will reject the drive) and also how they do tricks like soft-failure/re-validation, 520-byte sectors, etc. -marc On Tue, Feb 2, 2010 at 11:12 AM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Tue, 2 Feb 2010, David Dyer-Bennet wrote: Now, I'm sure not ALL drives offered at Newegg could qualify; but the question is, how much do I give up by buying an enterprise-grade drive from a major manufacturer, compared to the Sun-certified drive? If you have a Sun service contract, you give up quite a lot. If a Sun drive fails every other day, then Sun will replace that Sun drive every other day, even if the system warranty has expired. But if it is a non-Sun drive, then you have to deal with a disinterested drive manufacturer, which could take weeks or months. My experiences thus far is that if you pay for a Sun service contract, then you should definitely pay extra for Sun branded parts. Hopefully Oracle will do better than Sun at explaining the benefits and services provided by a service contract. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Tue, Feb 2, 2010 at 1:38 PM, Brandon High bh...@freaks.com wrote: On Sat, Jan 16, 2010 at 9:47 AM, Simon Breden sbre...@gmail.com wrote: Which consumer-priced 1.5TB drives do people currently recommend? I happened to be looking at the Hitachi product information, and noticed that the Deskstar 7K2000 appears to be supported in RAID configurations. One of the applications listed is Video editing arrays. http://www.hitachigst.com/portal/site/en/products/deskstar/7K2000/ I've been having good success with the Western Digital Caviar Black drives...which are cousins of their Enterprise RE3 platform. AFAIK, you're stuck at 1TB or 2TB capacities but I've managed to get some good deals on them... -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
I'm running the 500GB models myself, but I wouldn't say they're overly noisyand I've been doing ZFS/iSCSI/IOMeter/Bonnie++ stress testing with them. They whine rather than click FYI. -marc On Tue, Feb 2, 2010 at 2:58 PM, Simon Breden sbre...@gmail.com wrote: IIRC the Black range are meant to be the 'performance' models and so are a bit noisy. What's your opinion? And the 2TB models are not cheap either for a home user. The 1TB seem a good price. And from what little I read, it seems you can control the error reporting time with the WDTLER.EXE utility :) Cheers, Simon -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Tue, Feb 2, 2010 at 3:11 PM, Frank Cusack frank+lists/z...@linetwo.netwrote: That said, I doubt 2TB drives represent good value for a home user. They WILL fail more frequently and as a home user you aren't likely to be keeping multiple spares on hand to avoid warranty replacement time. I'm having a hard time convincing myself to go beyond 500GBboth for performance (I'm trying to build something with reasonable IOPS) and reliability reasons. -marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss