Re: [zfs-discuss] ZFS on solid state as disk rather than L2ARC...
14x 256gb MLC SSDs in a raidz2 array have worked fine for us. Performace seems to be mostly limited by the raid controller in operating in JBOD mode. Raidz2 allows sufficient redundancy to replace any MLC drives that develop issues and when you have that many consumer level SSDs, some will develop issues until you get all the weak ones weeded out. If I had more space in the box or didn't need quite as much space I would have split the array into 2 or more striped raidz2 arrays. Your mileage may vary. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and VMware
We are using zfs backed fibre targets for ESXi 4.1 and previously 4.0 and have had good performance with no issues. The fibre LUNS were formated with vmfs by the ESXi boxes. SQLIO benchmarks from guest system running on fibre attacted ESXi host. File Size MBThreads Read/Write DurationSector Size KB Pattern IOs oustanding IO/Sec MB/Sec Lat. Min. Lat. Ave. Lat. Max. 24576 8 R 30 8 random 64 37645 294 0 1 141 24576 8 W 30 8 random 64 17304 135 0 3 303 24576 8 R 30 64 random 64 6250391 1 9 176 24576 8 W 30 64 random 64 5742359 1 10 203 The array is a raidz2 with 14 x 256 gb Patriot Torqx drives and a cache with 4 x 32 gb intel 32 GB G1s When I get around to doing the next series of boxes I'll probably use c300s in place of the indellix based drives. iSCSI was disappointing and seemed to be CPU bound. Possibly by a stupid amount of interupts coming from the less than stellar nic on the test box. NFS we have only used as an ISO store, but it has worked ok and without issues. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool 'stuck' after failed zvol destory and reboot
For arc reasons if no other, I would max it out to the 8 gb regardless. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool 'stuck' after failed zvol destory and reboot
Assuming there are no other volumes sharing slices of those disk, why import? Just over write the disk with a new pool using the f flag during creation. I'm just sayin since you were destroying the volume anyway I presume there is no data we are trying to preserve here. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS read performance terrible
Hi r2ch The operations column shows about 370 operations for read - per spindle (Between 400-900 for writes) How should I be measuring iops? It seems to me then that your spindles are going about as fast as they can and your just moving small block sizes. There are lots of ways to test for iops, but for this purpose imo the operations column is fine. I think the next step would be to attatch a couple of inexpensive SSDs as cache and zil to see what that did for me. Understanding that it wil only make a difference on data that is warm for reads and commit required for writes. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS read performance terrible
How many iops per spindle are you getting? A rule of thumb I use is to expect no more than 125 iops per spindle for regular HDDs. SSDs are a different story of course. :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to remote any ideas for a faster way than ssh?
On the receiver /opt/csw/bin/mbuffer -m 1G -I Ostor-1:8000 | zfs recv -F e...@sunday in @ 0.0 kB/s, out @ 0.0 kB/s, 43.7 GB total, buffer 100% fullcannot receive new filesystem stream: invalid backup stream mbuffer: error: outputThread: error writing to stdout at offset 0xaedf6a000: Broken pipe summary: 43.7 GByte in 3 min 25.5 sec - average of 218 MB/s mbuffer: warning: error during output to stdout: Broken pipe On the sender /sbin/zfs send e...@sunday | /opt/csw/bin/mbuffer -m 1G -O Ostor-2:8000 in @ 0.0 kB/s, out @ 0.0 kB/s, 44.7 GB total, buffer 100% fullmbuffer: error: outputThread: error writing to Ostor-2:8000 at offset 0xb2e0e6000: Broken pipe summary: 44.7 GByte in 3 min 25.6 sec - average of 223 MB/s mbuffer: warning: error during output to Ostor-2:8000: Broken pipe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to remote any ideas for a faster way than ssh?
I'll try an export/import and scrub of the receiving pool and see what that does. I can't take the sending pool offline to try that stuff though. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs send to remote any ideas for a faster way than ssh?
I've tried ssh blowfish and scp arcfour. both are CPU limited long before the 10g link is. I'vw also tried mbuffer, but I get broken pipe errors part way through the transfer. I'm open to ideas for faster ways to to either zfs send directly or through a compressed file of the zfs send output. For the moment I; zfs send pigz scp arcfour the file gz file to the remote host gunzip to zfs receive This takes a very long time for 3 TB of data, and barely makes use the 10g connection between the machines due to the CPU limiting on the scp and gunzip processes. Thank you for your thoughts Richard J. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to remote any ideas for a faster way than ssh?
If this is across a trusted link, have a look at the HPN patches to ZFS. There are three main benefits to these patches: - increased (and dynamic) buffers internal to SSH - adds a multi-threaded aes cipher - adds the NONE cipher for non-encrypted data transfers (authentication is still encrypted) Yes I've looked at that site before. While I'm comfortable with the zfs storage aspects of osol, I'm still quite leary of compiling from source code. I really really don't want to break these machines. If it's a package I can figure out how to get and install it, but code patches scare me. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to remote any ideas for a faster way than ssh?
I've used mbuffer to transfer hundreds of TB without a problem in mbuffer itself. You will get disconnected if the send or receive prematurely ends, though. mbuffer itself very specifically ends with a broken pipe error. Very quickly with s set to 128 or after sometime with s set over 1024. My only thought at this point is it maybe something with either the myri10ge driver or the fact the machines are directly connected with out the use of an intervening switch. Perhaps someone over in networking my have a thought in that direction. Richard J. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to remote any ideas for a faster way than ssh?
Using SunOS X 5.11 snv_133 i86pc i386 i86pc. So the network thing that was fixed in 129 shouldn't be the issue. -Original Message- From: Brent Jones [mailto:br...@servuhome.net] Sent: Monday, July 19, 2010 1:02 PM To: Richard Jahnel Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] zfs send to remote any ideas for a faster way than ssh? On Mon, Jul 19, 2010 at 9:06 AM, Richard Jahnel rich...@ellipseinc.com wrote: I've tried ssh blowfish and scp arcfour. both are CPU limited long before the 10g link is. I'vw also tried mbuffer, but I get broken pipe errors part way through the transfer. I'm open to ideas for faster ways to to either zfs send directly or through a compressed file of the zfs send output. For the moment I; zfs send pigz scp arcfour the file gz file to the remote host gunzip to zfs receive This takes a very long time for 3 TB of data, and barely makes use the 10g connection between the machines due to the CPU limiting on the scp and gunzip processes. Thank you for your thoughts Richard J. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss I found builds 130 had issues with TCP. I could reproduce TCP timeouts/socket errors up until I got on 132. I have stayed on 132 so far since I haven't found any other show stoppers. Mbuffer is probably your best bet, I rolled mbuffer into my replication scripts, which I could share if anyone's interested. Older versions of my script are on www.brentrjones.com but I have a new one which uses mbuffer -- Brent Jones br...@servuhome.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to remote any ideas for a faster way than ssh?
FWIW I found netcat over at CSW. http://www.opencsw.org/packages/CSWnetcat/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] does sharing an SSD as slog and l2arc reduces its life span?
TBH write amp was not considered, but since I've never heard of a write amp over 1.5, for my purposes on the 256gb drives they still last welll over the required 5 year life span. Again it does hurt a lot when your using smaller drives that less space available for wear leveling. I suppose for cache drives it will only be a minor annoyance when you have to replace the drive. Seeing as a cache failure won't lead to data loss. In my mind it would be more of a concern for a slog drive. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] does sharing an SSD as slog and l2arc reduces its life span?
Well pretty much by definition any writes shorten the drives life, the more writes the shorter it is. That said, here is some interesting math that I did before I built my first mlc array. For a certain brand of indellix drive I calculated the life span in the following way. Based on the maximum sustained write speed of the drive and the size of the drive (256GB by the way) it would take 9 months to over write the entire drive 1 times at 100% busy writing. However I knew that my controller would be lucky to keep all the drives at 25% busy (btw turns out that it's really about 12%) so I took the 9 months and multiplied that times 4 coming up with 36 months. Great now we're at 3 years, but we're still doing 100% writes and we know that this isn't going to be the case. In fact we expect the absolute worst case scenario is that we'll be doing less than 25% writes. So again I took the 3 years and multiplied that times 4. This comes out to 12 years to wear out my mlc drives. Just in case I'm calling it 10 years. But you know what? Quite frankly those boxes will be retired in less than 5 years and even then I'll be suprised if it's still my problem to worry about. Of all the issues that might concern me about using mlc drives, them wearing out isn't really one of them. Of course if your useing tiny drives, the math changes. In fact under the above scenario, assuming a 32gb drive went as fast as a 256gb drive (and they don't btw) your 32gb drive would only last about 18 months. Since it's probably only has half the chip count of the larger drive, and thus only using half it's write channels, you probably still have about 3 years of life in the drive running at 25% busy x 25% writes. Just some food for thought. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OCZ Devena line of enterprise SSD
The EX specs page does list the supercap The pro specs page does not. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Crucial RealSSD C300 and cache flush?
I'm interested in the answer to this as well. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ssd pool + ssd cache ?
I'll have to take your word on the Zeus drives. I don't see any thing in thier literature that explicitly states that cache flushes are obeyed or other wise protected against power loss. As for OCZ they cancelled the Vertex 2 Pro which was to be the one with the super cap. For the moment they are just selling the Vertex 2 and Vetex LE neither of which have the super cap. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ssd pool + ssd cache ?
And a very nice device it is indeed. However for my purposes it doesn't work as it doesn't fit into a 2.5 slot and use sata/sas connections. Unfortunately all my pci express slots are in use. 2 raid controllers 1 Fibre HBA 1 10gb ethernet card. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ssd pool + ssd cache ?
Do you lose the data if you lose that 9v feed at the same time the computer losses power? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ssd pool + ssd cache ?
FWIW. I use 4 intel 32gb ssds as read cache for each pool of 10 Patriot Torx drives which are running in a raidz2 configuration. No Slogs as I haven't seen a compliant SSD drive yet. I am pleased with the results. The bottleneck really turns out to be the 24 port raid card they are plugged into. Bonnie++ Local read about 750 MB/sec rewrite about 450 MB/sec Write about 600 MB/sec if memory serves. A SQLio test run from a fibre connected vmware guest reached over 16,000 IOPS for 8k random reads. Because the vmware host only has a 4gb Fibre card max reads were limited to a hair under 400 MB/sec. Useing several guest on two VMware host machines achieved 690 MB/sec reads combined. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] why both dedup and compression?
I've googled this for a bit, but can't seem to find the answer. What does compression bring to the party that dedupe doesn't cover already? Thank you for you patience and answers. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] why both dedup and compression?
Hmm... To clarify. Every discussion or benchmarking that I have seen always show both off, compression only or both on. Why never compression off and dedup on? After some further thought... perhaps it's because compression works at the byte level and dedup is at the block level. Perhaps I have answered my own question. Some confirmation would be nice though. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?
For the l2arc you want iops pure an simple. For this I think the Intel SSDs are still king. The slog however has a gotcha, you want a iops, but also you want something that doesn't say it's done writing until the write is safely nonvolitile. The intel drives fail in this regard. So far I'm thinking the best bet will likely one of the sandforce sf-1500 based drives with the supercap on it. Something like the Vertex 2 pro. These are of course just my thoughts on the matter as I work towards designing a SQL storage backend. Your mileage may vary. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
You might note, dedupe only dedupes data that is writen after the flag is set. It does not retroactivly dedupe already writen data. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup screwing up snapshot deletion
Thank you for the corrections. Also I forgot about using an SSD to assist. My bad. =) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup screwing up snapshot deletion
This sounds like the known issue about the dedupe map not fitting in ram. When blocks are freed, dedupe scans the whole map to ensure each block is not is use before releasing it. This takes a veeery long time if the map doesn't fit in ram. If you can try adding more ram to the system. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Areca ARC-1680 on OpenSolaris 2009.06?
Just as an FYI, not all drives like sas expanders. As an example, we had a lot of trouble with Indilinx MLC based SSDs. The systems had Adaptec 52445 controllers and Chenbro SAS expanders. In the end we had to remove the SAS expanders and put a 2nd 52445 in each system to get them to work properly. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Areca ARC-1680 on OpenSolaris 2009.06?
Any hints as to where you read that? I'm working on another system design with LSI controllers and being able to use SAS expanders would be a big help. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send hangs
I had some issues with direct send/receives myself. In the end I elected to send to a gz file and then scp that file across to receive from the file on the otherside. This has been working fine 3 times a day for about 6 months now. two sets of systems using doing this so far, a set running b111b and a set running b133. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about backup and mirrored pools
Mirrored sets do protect against disk failure, but most of the time you'll find proper backups are better as most issues are more on the order of oops than blowed up sir. Perhaps mirrored sets with daily snapshots and a knowedge of how to mount snapshots as clones so that you can pull a copy of that file you deleted 3 days ago. :) If your especially paranoid a 3 way mirror set with copies set to 2. =) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] VMware client solaris 10, RAW physical disk and zfs snapshots problem - all created snapshots are equal to zero.
what size is the gz file if you do an incremental send to file? something like: zfs send -i sn...@vol sn...@vol | gzip /somplace/somefile.gz -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] b134 - Mirrored rpool won't boot unless both mirrors are present
Exactly where in the menu.lst would I put the -r ? Thanks in advance. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ to RaidZ2
zfs send s...@oldpool | zfs receive newpool -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAIDZ2 configuration
I think I would do 3xraidz3 with 8 disks and 0 hotspares. That way you have a better chance of resolving bit rot issues that might become apparent during a rebuild. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pool use from network poor performance
Awsome, glad to hear that you got it figured out. :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAIDZ2 configuration
Well the thing I like about raidz3 is that even with 1 drive out you have 3 copies of all the blocks. So if you encounter bit rot, not only can checksums be used to find the good data, you can still get a best 2 out of 3 vote on which data is correct. As to performance, all I can say is test test test. Pick your top 3 contenders, install bonnie++ and then test each configuration. Then make your decision based on your personal balance between performance, space and reliability. Due to space considerations I had to choose raidz2 over raidz3. I just couldn't give up that last drives worth of space. oddly enough in my enviroment I got better performance out of raidz2 than I did out of 7 mirrors striped together. It may be because they are all 250gb ssds. I think the bottleneck in my case is either the Adaptec raid card or cpu. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pool use from network poor performance
what does prstat show? We had a lot of trouble here using iscsi and zvols due to the cpu capping out with speeds less than 20mb/sec. After simply switching to Qlogic fibre HBAs and a file backed lu we went to 160mb/sec on that same test platform. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
Not quite brave enough to put dedup into prodiction here. Concerned about the issues some folks have had when releasing large numbers of blocks in one go. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
They way we do this here is: zfs snapshot voln...@snapnow [i]#code to break on error and email not shown.[/i] zfs send -i voln...@snapbefore voln...@snapnow | pigz -p4 -1 file [i]#code to break on error and email not shown.[/i] scp /dir/file u...@remote:/dir/file [i]#code to break on error and email not shown.[/i] shh u...@remote gzip -t /dir/file [i]#code to break on error and email not shown.[/i] shh u...@remote gunzip /dir/file | zfs receive volname It works for me and it sends a minimum amount of data across the wire which is tested to minimize the chance of inflight issues. Excpet on Sundays when we do a full send. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send and receive corruption across a WAN link?
no, but I'm slightly paranoid that way. ;) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss