Re: [zfs-discuss] RAID Failure Calculator (for 8x 2TB RAIDZ)
On Mon, Feb 7, 2011 at 7:53 PM, Richard Elling richard.ell...@gmail.com wrote: On Feb 7, 2011, at 1:07 PM, Peter Jeremy wrote: On 2011-Feb-07 14:22:51 +0800, Matthew Angelo bang...@gmail.com wrote: I'm actually more leaning towards running a simple 7+1 RAIDZ1. Running this with 1TB is not a problem but I just wanted to investigate at what TB size the scales would tip. It's not that simple. Whilst resilver time is proportional to device size, it's far more impacted by the degree of fragmentation of the pool. And there's no 'tipping point' - it's a gradual slope so it's really up to you to decide where you want to sit on the probability curve. The tipping point won't occur for similar configurations. The tip occurs for different configurations. In particular, if the size of the N+M parity scheme is very large and the resilver times become very, very large (weeks) then a (M-1)-way mirror scheme can provide better performance and dependability. But I consider these to be extreme cases. Empirically it seems that resilver time is related to number of objects as much (if not more than) amount of data. zpools (mirrors) with similar amounts of data but radically different numbers of objects take very different amounts of time to resilver. I have NOT (yet) started actually measuring and tracking this, but the above is based on casual observation. P.S. I am measuring number of objects via `zdb -d` as that is faster than trying to count files and directories and I expect is a much better measure of what the underlying zfs code is dealing with (a particular dataset may have lots of snapshot data that does not (easily) show up). -- {1-2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) - Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAID Failure Calculator (for 8x 2TB RAIDZ)
On Feb 14, 2011 6:56 AM, Paul Kraus p...@kraus-haus.org wrote: P.S. I am measuring number of objects via `zdb -d` as that is faster than trying to count files and directories and I expect is a much better measure of what the underlying zfs code is dealing with (a particular dataset may have lots of snapshot data that does not (easily) show up). It's faster because; a) no atime updates, b) no ZPL overhead. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] existing performance data for on-disk dedup?
Hello. I am looking to see if performance data exists for on-disk dedup. I am currently in the process of setting up some tests based on input from Roch, but before I get started, thought I'd ask here. Thanks for the help, Janice ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very bad ZFS write performance. Ok Read.
Thanks for the responses.. I found the issue. It was due to power management, and a probably bug with event driven power management states, changing cpupm enable to cpupm enable poll-mode in /etc/power.conf fixed the issue for me. back up to 110MB/sec+ now.. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to destroy a pool by id?
I have old pool skeletons with vdevs that no longer exist. Can't import them, can't destroy them, can't even rename them to something obvious like junk1. What do I do to clean up? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very bad ZFS write performance. Ok Read.
On Sat, Feb 12, 2011 at 3:14 AM, ian W dropbears...@yahoo.com.au wrote: Thanks for the responses.. I found the issue. It was due to power management, and a probably bug with event driven power management states, changing cpupm enable to cpupm enable poll-mode in /etc/power.conf fixed the issue for me. back up to 110MB/sec+ now.. Interesting - I have a E6600 also, and I will give this a try. I left 'cpupm enable' in /etc/power.conf because powertop/prtdiag properly reported all the available P/C-states of my CPU, so I assumed that power management was good to go. What do you have cpu-threshold set too? (This may be a moot point for me, because my CPU is littering fault management with strings of L2 cache errors, so might be upgrading to Nehalem soon). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] existing performance data for on-disk dedup?
Hi Janice, Hello. I am looking to see if performance data exists for on-disk dedup. I am currently in the process of setting up some tests based on input from Roch, but before I get started, thought I'd ask here. I find it somewhat interesting that you are asking this question on behalf of work you are doing for Roch, wherein Roch posted the following blog, with references. http://blogs.sun.com/roch/entry/dedup_performance_considerations1 That said, there is the ZFS page: http://hub.opensolaris.org/bin/view/Community+Group+zfs/dedup As far as synthetic testing of dedup, I found that the latest version of VDBench supports dedup, and is helpful on narrowing in on specific issues related to the size of the DDT, the ARC and L2ARC. http://blogs.sun.com/henk/entry/first_beta_version_of_vdbench Jim Thanks for the help, Janice ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to destroy a pool by id?
Hi Chris, Yes, this is a known problem and a CR is filed. I haven't tried these in a while, but consider one of the following workarounds below. #1 is most drastic and make sure you've got the right device name. No sanity checking is done by the dd command. Other experts can comment on a better dd command. Thanks, Cindy 1. Wipe the disk label with a dd command. For example: dd if=/dev/zero of=/dev/dsk/c1t0d0s0 count=100 bs=512k (The following two probably won't work if you're getting device in use messages.) 2. Force the creation of a new pool on the disk, like this: # zpool create -f pool ct1d0 Then, remove the new pool: # zpool destroy pool 3. Put the opposite label on the disk. If the disk has an SMI label, use format -e to force an EFI label. Or, vice versa. On 02/12/11 03:33, chris wrote: I have old pool skeletons with vdevs that no longer exist. Can't import them, can't destroy them, can't even rename them to something obvious like junk1. What do I do to clean up? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] One LUN per RAID group
With ZFS on a Solaris server using storage on a SAN device, is it reasonable to configure the storage device to present one LUN for each RAID group? I'm assuming that the SAN and storage device are sufficiently reliable that no additional redundancy is necessary on the Solaris ZFS server. I'm also assuming that all disk management is done on the storage device. I realize that it is possible to configure more than one LUN per RAID group on the storage device, but doesn't ZFS assume that each LUN represents an independant disk, and schedule I/O accordingly? In that case, wouldn't ZFS I/O scheduling interfere with I/O scheduling already done by the storage device? Is there any reason not to use one LUN per RAID group? -- -Gary Mills--Unix Group--Computer and Network Services- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS read/write fairness algorithm for single pool
Hi Nathan, comments below... On Feb 13, 2011, at 8:28 PM, Nathan Kroenert wrote: On 14/02/2011 4:31 AM, Richard Elling wrote: On Feb 13, 2011, at 12:56 AM, Nathan Kroenertnat...@tuneunix.com wrote: Hi all, Exec summary: I have a situation where I'm seeing lots of large reads starving writes from being able to get through to disk. snip What is the average service time of each disk? Multiply that by the average active queue depth. If that number is greater than, say, 100ms, then the ZFS I/O scheduler is not able to be very effective because the disks are too slow. Reducing the active queue depth can help, see zfs_vdev_max_pending in the ZFS Evil Tuning Guide. Faster disks helps, too. NexentaStor fans, note that you can do this easily, on the fly, via the Settings - Preferences - System web GUI. -- richard Hi Richard, Long time no speak! Anyhoo - See below. I'm unconvinced that faster disks would help. I think faster disks, at least in what I'm observing, would make it suck just as bad, just reading faster... ;) Maybe I'm missing something. Faster disks always help :-) Queue depth is around 10 (default and unchanged since install), and average service time is about 25ms... Below are 1 second samples with iostat - while I have included only about 10 seconds, it's representative of what I'm seeing all the time. extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 360.9 13.0 46190.5 351.4 0.0 10.0 26.7 1 100 sd7 342.9 12.0 43887.3 329.9 0.0 10.0 28.1 1 100 ok, we'll take sd6 as an example (the math is easy :-) ... actv = 10 svc_t = 26.7 actv * svc_t = 267 milliseconds This is the queue at the disk. ZFS manages its own queue for the disk, but once it leaves ZFS, there is no way for ZFS to manage it. In the case of the active queue, the I/Os have left the OS, so even the OS is unable to change what is in the queue or directly influence when the I/Os will be finished. In ZFS, the queue has a priority scheduler and does place a higher priority on async writes than async reads (since b130 or so). But what you can see is that the intermittent nature of the async writes get stuck behind the 267 milliseconds as the queue drains the reads. [no, I'm not sure if that makes sense, try again...] If it sends reads continuously and writes occasionally, it will appear that reads have much more domination. In older releases, when the reads and writes had the same priority, this looks even worse. extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 422.10.0 54025.00.0 0.0 10.0 23.6 1 100 sd7 422.10.0 54025.00.0 0.0 10.0 23.6 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 370.0 11.0 47360.4 342.0 0.0 10.0 26.2 1 100 sd7 327.0 16.0 41856.4 632.0 0.0 9.6 28.0 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 388.07.0 49406.4 290.0 0.0 9.8 24.8 1 100 sd7 409.01.0 52350.32.0 0.0 9.5 23.2 1 99 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 423.00.0 54148.60.0 0.0 10.0 23.6 1 100 sd7 413.00.0 52868.50.0 0.0 10.0 24.2 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 400.02.0 51081.22.0 0.0 10.0 24.8 1 100 sd7 384.04.0 49153.24.0 0.0 10.0 25.7 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 401.91.0 51448.98.0 0.0 10.0 24.8 1 100 sd7 424.90.0 54392.40.0 0.0 10.0 23.5 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 215.1 208.1 26751.9 25433.5 0.0 9.3 22.1 1 100 sd7 189.1 216.1 24199.1 26833.9 0.0 8.9 22.1 1 91 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 295.0 162.0 37756.8 20610.2 0.0 10.0 21.8 1 100 sd7 307.0 150.0 39292.6 19198.4 0.0 10.0 21.8 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 405.02.0 51843.86.0 0.0 10.0 24.5 1 100 sd7 408.03.0 52227.8 10.0 0.0 10.0 24.3 1 100 Bottom line is that ZFS does not seem to be caring about getting my writes to disk when there is a heavy read workload. I have also confirmed that it's not the RAID controller either - behaviour is identical with direct attach SATA. But - to your excellent theory: Setting zfs_vdev_max_pending to 1 causes
Re: [zfs-discuss] One LUN per RAID group
On Mon, Feb 14, 2011 at 2:38 PM, Gary Mills mi...@cc.umanitoba.ca wrote: I realize that it is possible to configure more than one LUN per RAID group on the storage device, but doesn't ZFS assume that each LUN represents an independant disk, and schedule I/O accordingly? In that case, wouldn't ZFS I/O scheduling interfere with I/O scheduling already done by the storage device? Is there any reason not to use one LUN per RAID group? My empirical testing confirms both the claims made that ZFS random read I/O (at the very least) scales linearly with the NUMBER of vdev's and NOT the number of spindles as well as the recommendation (I believe from an Oracle White Paper on using ZFS for Oracle DBs) that if you are using a hardware RAID device (with NVRAM write cache), you should configure one LUN per spindle in the backend raid set. In other words, if you build a zpool with one vdev of 10GB and another with two vdev's each of 5GB (both coming from the same array and raid set) you get almost exactly twice the random read performance from the 2x5 zpool vs. the 1x10 zpool. Also, using a 2540 disk array setup as a 10 disk RAID6 (with 2 hot spares), you get substantially better random read performance using 10 LUNs vs. 1 LUN. While inconvenient, this just reflects the scaling of ZFS aith number of vdevs and not spindles. I suggest performing your own testing to insure you have the performance to handle your specific application load. Now, as to reliability, the hardware RAID array cannot detect silent corruption of data the way the end to end ZFS checksum can. -- {1-2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) - Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ACL for .zfs directory
Hi Ian, You are correct. Previous Solaris releases displayed older POSIX ACL info on this directory. It was changed to the new ACL style from the integration of this CR: 6792884 Vista clients cannot access .zfs Thanks, Cindy On 02/13/11 19:30, Ian Collins wrote: While scanning filesystems looking fro who has read access to files, I see the ACL type of the .zfs/snapshot directory varies between releases (non-ZFS in Solaris 10, ZFS in Solaris 11 Express). Is this documented anywhere? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ACL for .zfs directory
On 02/15/11 10:14 AM, Cindy Swearingen wrote: Hi Ian, You are correct. Previous Solaris releases displayed older POSIX ACL info on this directory. It was changed to the new ACL style from the integration of this CR: 6792884 Vista clients cannot access .zfs Thanks Cindy. Unfortunately bugs.opensolaris.org appears to be FUBAR, so I couldn't look it up! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and Virtual Disks
Hi I wanted to get some expert advice on this. I have an ordinary hardware SAN from Promise Tech that presents the LUNs via iSCSI. I would like to use that if possible with my VMware environment where I run several Solaris / OpenSolaris virtual machines. My question is regarding the virtual disks. 1. Should I create individual iSCSI LUNs and present those to the VMware ESXi host as iSCSI storage, and then create virtual disks from there on each Solaris VM? - or - 2. Should I (assuming this is possible), let the Solaris VM mount the iSCSI LUNs directly (that is, NOT show them as VMware storage but let the VM connect to the iSCSI across the network.) ? Part of the issue is I have no idea if having a hardware RAID 5 or 6 disk set will create a problem if I then create a bunch of virtual disks and then use ZFS to create RAIDZ for the VM to use. Seems like that might be asking for trouble. This environment is completely available to mess with (no data at risk), so I'm willing to try any option you guys would recommend. Thanks! -- Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Virtual Disks
On Tue, Feb 15, 2011 at 5:47 AM, Mark Creamer white...@gmail.com wrote: Hi I wanted to get some expert advice on this. I have an ordinary hardware SAN from Promise Tech that presents the LUNs via iSCSI. I would like to use that if possible with my VMware environment where I run several Solaris / OpenSolaris virtual machines. My question is regarding the virtual disks. 1. Should I create individual iSCSI LUNs and present those to the VMware ESXi host as iSCSI storage, and then create virtual disks from there on each Solaris VM? - or - 2. Should I (assuming this is possible), let the Solaris VM mount the iSCSI LUNs directly (that is, NOT show them as VMware storage but let the VM connect to the iSCSI across the network.) ? Part of the issue is I have no idea if having a hardware RAID 5 or 6 disk set will create a problem if I then create a bunch of virtual disks and then use ZFS to create RAIDZ for the VM to use. Seems like that might be asking for trouble. The ideal solution would be to present all disks directly as JBOD to solaris without any raid/virtualization (either from the storage of vmware). If you use (1), you'd pretty much given up data integrity check to the lower layer (SAN + ESXi). In this case you'd probably better off simply using stripe on zfs side (there's not much advantage of using raidz if the block device would reside on the same physical disk in the SAN anyway). If you use (2), you should have the option of exporting each raw disk on the SAN as a LUN to solaris, and you can create mirror/raidz from it. However this setup is more complicated (e.g. need to setup the SAN in a specific way, which it may or may not be capable of), plus there's a performance overhead from vmware virtual network. Personally I'd chose (1), and use zfs simply for it's snapshot/clone/compression capability, not for its data integrity check. -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Virtual Disks
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Mark Creamer 1. Should I create individual iSCSI LUNs and present those to the VMware ESXi host as iSCSI storage, and then create virtual disks from there on each Solaris VM? - or - 2. Should I (assuming this is possible), let the Solaris VM mount the iSCSI LUNs directly (that is, NOT show them as VMware storage but let the VM connect to the iSCSI across the network.) ? If you do #1 you'll have a layer of vmware in between your guest machine and the storage. This will add a little overhead and possibly reduce performance slightly. If you do #2 you won't have access to snapshot features in vmware. Personally I would recommend using #2 and rely on ZFS snapshots instead of vmware snapshots. But maybe you have a good reason for using vmware snapshots... I don't want to make assumptions. Part of the issue is I have no idea if having a hardware RAID 5 or 6 disk set will create a problem if I then create a bunch of virtual disks and then use ZFS to create RAIDZ for the VM to use. Seems like that might be asking for trouble. Where is there any hardware raid5 or raid6 in this system? Whenever possible, you want to allow ZFS to manage the raid... configure the hardware to just pass-thru single disk jbod to the guest... Because when ZFS detects disk errors, if ZFS has the redundancy, it can correct them. But if there are disk problems on the hardware raid, the hardware raid will never know about it and it will never be correctable except by luck. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] One LUN per RAID group
On Mon, Feb 14, 2011 at 03:04:18PM -0500, Paul Kraus wrote: On Mon, Feb 14, 2011 at 2:38 PM, Gary Mills mi...@cc.umanitoba.ca wrote: Is there any reason not to use one LUN per RAID group? [...] In other words, if you build a zpool with one vdev of 10GB and another with two vdev's each of 5GB (both coming from the same array and raid set) you get almost exactly twice the random read performance from the 2x5 zpool vs. the 1x10 zpool. This finding is surprising to me. How do you explain it? Is it simply that you get twice as many outstanding I/O requests with two LUNs? Is it limited by the default I/O queue depth in ZFS? After all, all of the I/O requests must be handled by the same RAID group once they reach the storage device. Also, using a 2540 disk array setup as a 10 disk RAID6 (with 2 hot spares), you get substantially better random read performance using 10 LUNs vs. 1 LUN. While inconvenient, this just reflects the scaling of ZFS aith number of vdevs and not spindles. -- -Gary Mills--Unix Group--Computer and Network Services- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS read/write fairness algorithm for single pool
Thanks for all the thoughts, Richard. One thing that still sticks in my craw is that I'm not wanting to write intermittently. I'm wanting to write flat out, and those writes are being held up... Seems to me that zfs should know and do something about that without me needing to tune zfs_vdev_max_pending... Nonetheless, I'm now at a far more balanced point than when I started, so that's a good thing. :) Cheers, Nathan. On 15/02/2011 6:44 AM, Richard Elling wrote: Hi Nathan, comments below... On Feb 13, 2011, at 8:28 PM, Nathan Kroenert wrote: On 14/02/2011 4:31 AM, Richard Elling wrote: On Feb 13, 2011, at 12:56 AM, Nathan Kroenertnat...@tuneunix.com wrote: Hi all, Exec summary: I have a situation where I'm seeing lots of large reads starving writes from being able to get through to disk. snip What is the average service time of each disk? Multiply that by the average active queue depth. If that number is greater than, say, 100ms, then the ZFS I/O scheduler is not able to be very effective because the disks are too slow. Reducing the active queue depth can help, see zfs_vdev_max_pending in the ZFS Evil Tuning Guide. Faster disks helps, too. NexentaStor fans, note that you can do this easily, on the fly, via the Settings - Preferences - System web GUI. -- richard Hi Richard, Long time no speak! Anyhoo - See below. I'm unconvinced that faster disks would help. I think faster disks, at least in what I'm observing, would make it suck just as bad, just reading faster... ;) Maybe I'm missing something. Faster disks always help :-) Queue depth is around 10 (default and unchanged since install), and average service time is about 25ms... Below are 1 second samples with iostat - while I have included only about 10 seconds, it's representative of what I'm seeing all the time. extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 360.9 13.0 46190.5 351.4 0.0 10.0 26.7 1 100 sd7 342.9 12.0 43887.3 329.9 0.0 10.0 28.1 1 100 ok, we'll take sd6 as an example (the math is easy :-) ... actv = 10 svc_t = 26.7 actv * svc_t = 267 milliseconds This is the queue at the disk. ZFS manages its own queue for the disk, but once it leaves ZFS, there is no way for ZFS to manage it. In the case of the active queue, the I/Os have left the OS, so even the OS is unable to change what is in the queue or directly influence when the I/Os will be finished. In ZFS, the queue has a priority scheduler and does place a higher priority on async writes than async reads (since b130 or so). But what you can see is that the intermittent nature of the async writes get stuck behind the 267 milliseconds as the queue drains the reads. [no, I'm not sure if that makes sense, try again...] If it sends reads continuously and writes occasionally, it will appear that reads have much more domination. In older releases, when the reads and writes had the same priority, this looks even worse. extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 422.10.0 54025.00.0 0.0 10.0 23.6 1 100 sd7 422.10.0 54025.00.0 0.0 10.0 23.6 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 370.0 11.0 47360.4 342.0 0.0 10.0 26.2 1 100 sd7 327.0 16.0 41856.4 632.0 0.0 9.6 28.0 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 388.07.0 49406.4 290.0 0.0 9.8 24.8 1 100 sd7 409.01.0 52350.32.0 0.0 9.5 23.2 1 99 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 423.00.0 54148.60.0 0.0 10.0 23.6 1 100 sd7 413.00.0 52868.50.0 0.0 10.0 24.2 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 400.02.0 51081.22.0 0.0 10.0 24.8 1 100 sd7 384.04.0 49153.24.0 0.0 10.0 25.7 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 401.91.0 51448.98.0 0.0 10.0 24.8 1 100 sd7 424.90.0 54392.40.0 0.0 10.0 23.5 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 215.1 208.1 26751.9 25433.5 0.0 9.3 22.1 1 100 sd7 189.1 216.1 24199.1 26833.9 0.0 8.9 22.1 1 91 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 295.0 162.0 37756.8 20610.2 0.0 10.0 21.8 1 100 sd7 307.0 150.0 39292.6 19198.4 0.0 10.0 21.8 1 100 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd6 405.02.0 51843.86.0
Re: [zfs-discuss] Very bad ZFS write performance. Ok Read.
Hello my power.conf is as follows; any recommendations for improvement? device-dependency-property removable-media /dev/fb autopm enable autoS3 enable cpu-threshold 1s # Auto-Shutdown Idle(min) Start/Finish(hh:mm) Behavior autoshutdown 30 0:00 0:00 noshutdown S3-support enable cpu_deep_idle enable cpupm enable poll-mode -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] One LUN per RAID group
On 2/14/2011 3:52 PM, Gary Mills wrote: On Mon, Feb 14, 2011 at 03:04:18PM -0500, Paul Kraus wrote: On Mon, Feb 14, 2011 at 2:38 PM, Gary Millsmi...@cc.umanitoba.ca wrote: Is there any reason not to use one LUN per RAID group? [...] In other words, if you build a zpool with one vdev of 10GB and another with two vdev's each of 5GB (both coming from the same array and raid set) you get almost exactly twice the random read performance from the 2x5 zpool vs. the 1x10 zpool. This finding is surprising to me. How do you explain it? Is it simply that you get twice as many outstanding I/O requests with two LUNs? Is it limited by the default I/O queue depth in ZFS? After all, all of the I/O requests must be handled by the same RAID group once they reach the storage device. Also, using a 2540 disk array setup as a 10 disk RAID6 (with 2 hot spares), you get substantially better random read performance using 10 LUNs vs. 1 LUN. While inconvenient, this just reflects the scaling of ZFS aith number of vdevs and not spindles. I'm going to go out on a limb here and say that you get the extra performance under one condition: you don't overwhelm the NVRAM write cache on the SAN device head. So long as the SAN's NVRAM cache can acknowledge the write immediately (i.e. it isn't full with pending commits to backing store), then, yes, having multiple write commits coming from different ZFS vdevs will obviously give more performance than a single ZFS vdev. That said, given that SAN NVRAM caches are true write caches (and not a ZIL-like thing), it should be relatively simple to swamp one with write requests (most SANs have little more than 1GB of cache), at which point, the SAN will be blocking on flushing its cache to disk. So, if you can arrange your workload to avoid more than the maximum write load of the SAN's raid array over a defined period, then, yes, go with the multiple LUN/array setup. In particular, I would think this would be excellent for small-write/latency-sensitive applications, where the total amount of data written (over several seconds) isn't large, but where latency is critical. For larger I/O requests (or, for consistent, sustained I/O of more than small amounts), all bets are off as far as possibly advantage of multiple LUNS/array. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very bad ZFS write performance. Ok Read.
On Feb 14, 2011, at 4:49 PM, ian W wrote: Hello my power.conf is as follows; any recommendations for improvement? For best performance, disable power management. For certain processors and BIOSes, some combinations of power management (below the OS) are also known to be toxic. At Nexenta, current best practice is to disable C-states for Nehalems. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss