Re: [zfs-discuss] SSD and ZFS
Thanks Brendan, I was going to move it over to 8kb block size once I got through this index rebuild. My thinking was that a disproportionate block size would show up as excessive IO thruput, not a lack of thruput. The question about the cache comes from the fact that the 18GB or so that it says is in the cache IS the database. This was why I was thinking the index rebuild should be CPU constrained, and I should see a spike in reading from the cache. If the entire file is cached, why would it go to the disks at all for the reads? The disks are delivering about 30MB/s of reads, but this SSD is rated for sustained 70MB/s, so there should be a chance to pick up 100% gain. I've seen lots of mention of kernel settings, but those only seem to apply to cache flushes on sync writes. Any idea on where to look next? I've spent about a week tinkering with it.I'm trying to get a major customer to switch over to zfs and an open storage solution, but I'm afraid if I cant get it to work in the small scale, I cant convince them about the large scale. Thanks, Tracey On Fri, Feb 12, 2010 at 4:43 PM, Brendan Gregg - Sun Microsystems bren...@sun.com wrote: On Fri, Feb 12, 2010 at 02:25:51PM -0800, TMB wrote: I have a similar question, I put together a cheapo RAID with four 1TB WD Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with slice 0 (5GB) for ZIL and the rest of the SSD for cache: # zpool status dpool pool: dpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM dpool ONLINE 0 0 0 raidz1ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t0d1 ONLINE 0 0 0 c0t0d2 ONLINE 0 0 0 c0t0d3 ONLINE 0 0 0 [b]logs c0t0d4s0 ONLINE 0 0 0[/b] [b]cache c0t0d4s1 ONLINE 0 0 0[/b] spares c0t0d6AVAIL c0t0d7AVAIL capacity operationsbandwidth pool used avail read write read write -- - - - - - - dpool 72.1G 3.55T237 12 29.7M 597K raidz172.1G 3.55T237 9 29.7M 469K c0t0d0 - -166 3 7.39M 157K c0t0d1 - -166 3 7.44M 157K c0t0d2 - -166 3 7.39M 157K c0t0d3 - -167 3 7.45M 157K c0t0d4s020K 4.97G 0 3 0 127K cache - - - - - - c0t0d4s1 17.6G 36.4G 3 1 249K 119K -- - - - - - - I just don't seem to be getting any bang for the buck I should be. This was taken while rebuilding an Oracle index, all files stored in this pool. The WD disks are at 100%, and nothing is coming from the cache. The cache does have the entire DB cached (17.6G used), but hardly reads anything from it. I also am not seeing the spike of data flowing into the ZIL either, although iostat show there is just write traffic hitting the SSD: extended device statistics cpu devicer/sw/s kr/s kw/s wait actv svc_t %w %b us sy wt id sd0 170.00.4 7684.70.0 0.0 35.0 205.3 0 100 11 8 0 82 sd1 168.40.4 7680.20.0 0.0 34.6 205.1 0 100 sd2 172.00.4 7761.70.0 0.0 35.0 202.9 0 100 sd3 0.0 0.0 0.00.0 0.0 0.00.0 0 0 sd4 170.00.4 7727.10.0 0.0 35.0 205.3 0 100 [b]sd5 1.6 2.6 182.4 104.8 0.0 0.5 117.8 0 31 [/b] Since this SSD is in a RAID array, and just presents as a regular disk LUN, is there a special incantation required to turn on the Turbo mode? Doesnt it seem that all this traffic should be maxing out the SSD? Reads from the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to add as a mirror, but it seems pointless if there's no zip to be had The most likely reason is that this workload has been identified as streaming by ZFS, which is prefetching from disk instead of the L2ARC (l2arc_nopreftch=1). It also looks like you've used a 128 Kbyte ZFS record size. Is Oracle doing 128 Kbyte random I/O? We usually tune that down before creating the database; which will use the L2ARC device more efficiently. Brendan -- Brendan Gregg, Fishworks http://blogs.sun.com/brendan -- Tracey Bernath 913-488-6284 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work
I had a very similar problem. 8 external USB drives running OpenSolaris native. When I moved the machine into a different room and powered it back up (there were a couple of reboots and a couple of broken usb cables and drive shut downs in between), I got the same error. Loosing that much data is definitely a shock. I m running zraid2 and I would have assumed that a 2 level redundancy should fine to toss a lot of roughness at the pool. After panicking a little, stressing my family out, and some playing with zdb that lead nowhere, I did a zpool export mypool zpool import mypool It complained about being unable to mount because the mount point was not empty, so I did umount /mypool/mypool zfs mount mypool/mypool zfs status mypool and to my relieving surprise it seems all fine. ls /mypool/mypool does show data. Scrub is running right now to be on the safe side. Thought that may help some folks out there. Cheers! Andy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work
I just have the say this, and I don't mean it in a bad way... If you really care about your data why then use usb drives with lose cables and (apparently no backup) USB connected drives for data backup are okay, for playing around and getting to know ZFS seems also okay. Using it for online data that you care about and expecting it to be reliable...its just not the right technology for that imho. ..Remco On 2/13/10 11:23 AM, Andy Stenger wrote: I had a very similar problem. 8 external USB drives running OpenSolaris native. When I moved the machine into a different room and powered it back up (there were a couple of reboots and a couple of broken usb cables and drive shut downs in between), I got the same error. Loosing that much data is definitely a shock. I m running zraid2 and I would have assumed that a 2 level redundancy should fine to toss a lot of roughness at the pool. After panicking a little, stressing my family out, and some playing with zdb that lead nowhere, I did a zpool export mypool zpool import mypool It complained about being unable to mount because the mount point was not empty, so I did umount /mypool/mypool zfs mount mypool/mypool zfs status mypool and to my relieving surprise it seems all fine. ls /mypool/mypool does show data. Scrub is running right now to be on the safe side. Thought that may help some folks out there. Cheers! Andy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Oracle Performance - ZFS vs UFS
Was wondering if anyone has had any performance issues with Oracle running on ZFS as compared to UFS? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS performance benchmarks in various configurations
I have a new server, with 7 disks in it. I am performing benchmarks on it before putting it into production, to substantiate claims I make, like striping mirrors is faster than raidz and so on. Would anybody like me to test any particular configuration? Unfortunately I don't have any SSD, so I can't do any meaningful test on the ZIL etc. Unless someone in the Boston area has a 2.5 SAS SSD they wouldn't mind lending for a few hours. ;-) My hardware configuration: Dell PE 2970 with 8 cores. Normally 32G, but I pulled it all out to get it down to 4G of ram. (Easier to benchmark disks when the file operations aren't all cached.) ;-) Solaris 10 10/09. PERC 6/i controller. All disks are configured in PERC for Adaptive ReadAhead, and Write Back, JBOD. 7 disks present, each SAS 15krpm 160G. OS is occupying 1 disk, so I have 6 disks to play with. I am currently running the following tests: Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. iozone -Reab somefile.wks -g 17G -i 1 -i 0 Configurations being tested: . Single disk . 2-way mirror . 3-way mirror . 4-way mirror . 5-way mirror . 6-way mirror . Two mirrors striped (or concatenated) . Three mirrors striped (or concatenated) . 5-disk raidz . 6-disk raidz . 6-disk raidz2 Hypothesized results: . N-way mirrors write at the same speed of a single disk . N-way mirrors read n-times faster than a single disk . Two mirrors striped read and write 2x faster than a single mirror . Three mirrors striped read and write 3x faster than a single mirror . Raidz and raidz2: No hypothesis. Some people say they perform comparable to many disks working together. Some people say it's slower than a single disk. Waiting to see the results. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs import fails even though all disks are online
Mark J Musante wrote: On Thu, 11 Feb 2010, Cindy Swearingen wrote: On 02/11/10 04:01, Marc Friesacher wrote: fr...@vault:~# zpool import pool: zedpool id: 10232199590840258590 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: zedpoolONLINE raidz1 ONLINE c4d0 ONLINE c5d0 ONLINE c6d0 ONLINE c7d0 ONLINE logs zedpoolONLINE mirror ONLINE c12t0d0p0 ONLINE c10t0d0p0 ONLINE Is this the actual unedited config output? I've never seen the name of the pool show up after logs. I've looked into it and think this is 6599442 zpool import has faults in the display which is fixed in build 116, whereas system is running build 111b One thing you can try is to use dtrace to look at any ldi_open_by_name(), ldi_open_by_devid(), or ldi_open_by_dev() calls that zfs makes. That may give a clue as to what's going wrong. fmdump -eV suggests that there are issues with pool-wide metadata objects, so device in cannot import 'zedpool': one or more devices is currently unavailable probably refers to raidz top-level vdev. Pool recovery would help to recover this pool regards, victor Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD and ZFS
comment below... On Feb 12, 2010, at 2:25 PM, TMB wrote: I have a similar question, I put together a cheapo RAID with four 1TB WD Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with slice 0 (5GB) for ZIL and the rest of the SSD for cache: # zpool status dpool pool: dpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM dpool ONLINE 0 0 0 raidz1ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t0d1 ONLINE 0 0 0 c0t0d2 ONLINE 0 0 0 c0t0d3 ONLINE 0 0 0 [b]logs c0t0d4s0 ONLINE 0 0 0[/b] [b]cache c0t0d4s1 ONLINE 0 0 0[/b] spares c0t0d6AVAIL c0t0d7AVAIL capacity operationsbandwidth pool used avail read write read write -- - - - - - - dpool 72.1G 3.55T237 12 29.7M 597K raidz172.1G 3.55T237 9 29.7M 469K c0t0d0 - -166 3 7.39M 157K c0t0d1 - -166 3 7.44M 157K c0t0d2 - -166 3 7.39M 157K c0t0d3 - -167 3 7.45M 157K c0t0d4s020K 4.97G 0 3 0 127K cache - - - - - - c0t0d4s1 17.6G 36.4G 3 1 249K 119K -- - - - - - - I just don't seem to be getting any bang for the buck I should be. This was taken while rebuilding an Oracle index, all files stored in this pool. The WD disks are at 100%, and nothing is coming from the cache. The cache does have the entire DB cached (17.6G used), but hardly reads anything from it. I also am not seeing the spike of data flowing into the ZIL either, although iostat show there is just write traffic hitting the SSD: extended device statistics cpu devicer/sw/s kr/s kw/s wait actv svc_t %w %b us sy wt id sd0 170.00.4 7684.70.0 0.0 35.0 205.3 0 100 11 8 0 82 sd1 168.40.4 7680.20.0 0.0 34.6 205.1 0 100 sd2 172.00.4 7761.70.0 0.0 35.0 202.9 0 100 sd3 0.0 0.0 0.00.0 0.0 0.00.0 0 0 sd4 170.00.4 7727.10.0 0.0 35.0 205.3 0 100 [b]sd5 1.6 2.6 182.4 104.8 0.0 0.5 117.8 0 31 [/b] iostat has a n option, which is very useful for looking at device names :-) The SSD here is perfoming well. The rest are clobbered. 205 millisecond response time will be agonizingly slow. By default, for this version of ZFS, up to 35 I/Os will be queued to the disk, which is why you see 35.0 in the actv column. The combination of actv=35 and svc_t200 indicates that this is the place to start working. Begin by reducing zfs_vdev_max_pending from 35 to something like 1 to 4. This will reduce the concurrent load on the disks, thus reducing svc_t. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29 -- richard Since this SSD is in a RAID array, and just presents as a regular disk LUN, is there a special incantation required to turn on the Turbo mode? Doesnt it seem that all this traffic should be maxing out the SSD? Reads from the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to add as a mirror, but it seems pointless if there's no zip to be had help? Thanks, Tracey -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle Performance - ZFS vs UFS
On Feb 13, 2010, at 5:23 AM, Tony MacDoodle wrote: Was wondering if anyone has had any performance issues with Oracle running on ZFS as compared to UFS? The ZFS for Databases wiki is the place to collect information and advice for database on ZFS. http://www.solarisinternals.com/wiki/index.php/ZFS_for_Databases I notice that it is missing some later research results and will try to update it over the next few days. ZFS can perform better or worse than UFS. Follow the recommendations for configuration with your database to avoid wasting time rediscovering the new world :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
Some thoughts below... On Feb 13, 2010, at 6:06 AM, Edward Ned Harvey wrote: I have a new server, with 7 disks in it. I am performing benchmarks on it before putting it into production, to substantiate claims I make, like “striping mirrors is faster than raidz” and so on. Would anybody like me to test any particular configuration? Unfortunately I don’t have any SSD, so I can’t do any meaningful test on the ZIL etc. Unless someone in the Boston area has a 2.5” SAS SSD they wouldn’t mind lending for a few hours. ;-) My hardware configuration: Dell PE 2970 with 8 cores. Normally 32G, but I pulled it all out to get it down to 4G of ram. (Easier to benchmark disks when the file operations aren’t all cached.) ;-) Solaris 10 10/09. PERC 6/i controller. All disks are configured in PERC for Adaptive ReadAhead, and Write Back, JBOD. 7 disks present, each SAS 15krpm 160G. OS is occupying 1 disk, so I have 6 disks to play with. Put the memory back in and limit the ARC cache size instead. x86 boxes have a tendency to change the memory bus speed depending on how much memory is in the box. Similarly, you can test primarycache settings rather than just limiting ARC size. I am currently running the following tests: Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. iozone -Reab somefile.wks -g 17G -i 1 -i 0 IMHO, sequential tests are a waste of time. With default configs, it will be difficult to separate the raw performance from prefetched performance. You might try disabling prefetch as an option. With sync writes, you will run into the zfs_immediate_write_sz boundary. Perhaps someone else can comment on how often they find interesting sequential workloads which aren't backup-related. Configurations being tested: · Single disk · 2-way mirror · 3-way mirror · 4-way mirror · 5-way mirror · 6-way mirror · Two mirrors striped (or concatenated) · Three mirrors striped (or concatenated) · 5-disk raidz · 6-disk raidz · 6-disk raidz2 Please add some raidz3 tests :-) We have little data on how raidz3 performs. Hypothesized results: · N-way mirrors write at the same speed of a single disk · N-way mirrors read n-times faster than a single disk · Two mirrors striped read and write 2x faster than a single mirror · Three mirrors striped read and write 3x faster than a single mirror · Raidz and raidz2: No hypothesis. Some people say they perform comparable to many disks working together. Some people say it’s slower than a single disk. Waiting to see the results. Please post results (with raw data would be nice ;-). If you would be so kind as to collect samples of iosnoop -Da I would be eternally grateful :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle Performance - ZFS vs UFS
Using ZFS for Oracle can be configured to deliver very good performance. Depending on what your priorities are in terms of critical metrics, keep in mind that the most performant solution is to use Oracle ASM on raw disk devices. That is not intended to imply anything negative about ZFS or UFS. The simple fact is that when you but your Oracle datafiles on any file system, there's a much longer code path involved in reading and writing files, along with the file systems use of memory that needs to be considered. ZFS offers enterprise-class features (the admin model, snapshots, etc) that make it a great choice to deploy in production, but, from a pure performance point-of-view, it's not going to be the absolute fastest. Configured correctly, it can meet or exceed performance requirements. For Oracle, you need to; - Make sure you're the latest Solaris 10 update release (update 8). - For the datafiles, set the recordsize to align with the db_block_size (8k) - Put the redo logs on a seperate zpool, with the default 128k recordsize - Disable ZFS data caching (primarycache=metadata). Let Oracle cache the data in the SGA. - Watch your space in your zpools - don't run them at 90% full. Read the link Richard sent for some additional information. Thanks, /jim Tony MacDoodle wrote: Was wondering if anyone has had any performance issues with Oracle running on ZFS as compared to UFS? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Sat, 13 Feb 2010, Edward Ned Harvey wrote: Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. iozone -Reab somefile.wks -g 17G -i 1 -i 0 Make sure to also test with a command like iozone -m -t 8 -T -O -r 128k -o -s 12G I am eager to read your test report. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Sat, 13 Feb 2010, Bob Friesenhahn wrote: Make sure to also test with a command like iozone -m -t 8 -T -O -r 128k -o -s 12G Actually, it seems that this is more than sufficient: iozone -m -t 8 -T -r 128k -o -s 4G since it creates a 4GB test file for each thread, with 8 threads. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] available space
I have the following pool: NAME SIZE USED AVAILCAP HEALTH ALTROOT OIRT 6.31T 3.72T 2.59T58% ONLINE / zfs list shows the following for a typical file system: NAMEUSED AVAIL REFER MOUNTPOINT OIRT/sakai/production 1.40T 1.77T 1.40T /OIRT/sakai/production Why is available lower when shown by zfs than zpool? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] available space
one shows pool size, one shows filesystem size. the pool size is based on raw space. the zfs list size shows how much is used and how much usable space is ableable. for instance, i use raidz2 with 1tb drives so if i do zpool list i see ALL the space, including parity, but if i do zfs list i only see how much space the filesystem seems. 2 different tools for 2 different jobs. On Sat, Feb 13, 2010 at 12:28 PM, Charles Hedrick hedr...@rutgers.eduwrote: I have the following pool: NAME SIZE USED AVAILCAP HEALTH ALTROOT OIRT 6.31T 3.72T 2.59T58% ONLINE / zfs list shows the following for a typical file system: NAMEUSED AVAIL REFER MOUNTPOINT OIRT/sakai/production 1.40T 1.77T 1.40T /OIRT/sakai/production Why is available lower when shown by zfs than zpool? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool import with failed ZIL device now possible ?
I have a similar situation. I have a system that is used for backup copies of logs and other non-critical things, where the primary copy is on a Netapp. Data gets written in batches a few times a day. We use this system because storage on it is a lot less expensive than on the Netapp. It's only non-critical data that is sent via NFS. Critical data is sent to this server either by zfs send | receive, or by an rsync running on the server that reads from the Netapp over NFS. Thus the important data shouldn't go through the ZIL. I am seriously considering turning off the ZIL, because NFS write performance is so lousy. I'd use SSD, except that I can't find a reasonable way of doing so. I have a pair of servers with Sun Cluster, sharing a J4200 JBOD. If there's a failure, operations move to the other server. Thus a local SSD is no better than ZIL disabled. I'd love to put an SSD in the J4200, but the claim that this was going to be supported seems to have vanished. Someone once asked why I both with redundant systems if I don't care about the data. The answer is that if the NFS mounts hang, my production service hang. Also, I do care about some of the data. It just happens not to go through the ZIL. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs/sol10u8 less stable than in sol10u5?
We recently patched our X4500 from Sol10 U6 to Sol10 U8 and have not noticed anything like what you're seeing. We do not have any SSD devices installed. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
IMHO, sequential tests are a waste of time. With default configs, it will be difficult to separate the raw performance from prefetched performance. You might try disabling prefetch as an option. Let me clarify: Iozone does a nonsequential series of sequential tests, specifically for the purpose of identifying the performance tiers, separating the various levels of hardware accelerated performance from the raw disk performance. This is the reason why I took out all but 4G of the system RAM. In the (incomplete) results I have so far, it's easy to see these tiers for a single disk: . For filesizes 0 to 4M, a single disk writes 2.8 Gbit/sec and reads ~40-60 Gbit/sec. This boost comes from writing to PERC cache, and reading from CPU L2 cache. . For filesizes 4M to 128M, a single disk writes 2.8 Gbit/sec and reads 24 Gbit/sec. This boost comes from writing to PERC cache, and reading from system memory. . For filesizes 128M to 4G, a single disk writes 1.2 Gbit/sec and reads 24 Gbit/sec. This boost comes from reading system memory. . For filesizes 4G to 16G, a single disk writes 1.2 Gbit/sec and reads 1.2 Gbit/sec This is the raw disk performance. (SAS, 15krpm, 146G disks) Please add some raidz3 tests :-) We have little data on how raidz3 performs. Does this require a specific version of OS? I'm on Solaris 10 10/09, and man zpool doesn't seem to say anything about raidz3 ... I haven't tried using it ... does it exist? Please post results (with raw data would be nice ;-). If you would be so kind as to collect samples of iosnoop -Da I would be eternally grateful :-) I'm guessing iosnoop is an opensolaris thing? Is there an equivalent for solaris? I'll post both the raw results, and my simplified conclusions. Most people would not want the raw data. Most people just want to know What's the performance hit I take by using raidz2 instead of raidz? and so on. Or ... What's faster, raidz, or hardware raid-5? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle Performance - ZFS vs UFS
On Sat, Feb 13, 2010 at 9:58 AM, Jim Mauro james.ma...@sun.com wrote: Using ZFS for Oracle can be configured to deliver very good performance. Depending on what your priorities are in terms of critical metrics, keep in mind that the most performant solution is to use Oracle ASM on raw disk devices. That is not intended to imply anything negative about ZFS or UFS. The simple fact is that when you but your Oracle datafiles on any file system, there's a much longer code path involved in reading and writing files, along with the file systems use of memory that needs to be considered. ZFS offers enterprise-class features (the admin model, snapshots, etc) that make it a great choice to deploy in production, but, from a pure performance point-of-view, it's not going to be the absolute fastest. Configured correctly, it can meet or exceed performance requirements. For Oracle, you need to; - Make sure you're the latest Solaris 10 update release (update 8). - For the datafiles, set the recordsize to align with the db_block_size (8k) - Put the redo logs on a seperate zpool, with the default 128k recordsize - Disable ZFS data caching (primarycache=metadata). Let Oracle cache the data in the SGA. - Watch your space in your zpools - don't run them at 90% full. Read the link Richard sent for some additional information. There is of course the caveat of using raw devices with databases (it becomes harder to track usage, especially as the number of LUNs increases, slightly less visibility into their usage statistics at the OS level ). However perhaps now someone can implement the CR I filed a long time ago to add ASM support to libfstyp.so that would allow zfs, mkfs, format, etc. to identify ASM volumes =) Thanks, /jim Tony MacDoodle wrote: Was wondering if anyone has had any performance issues with Oracle running on ZFS as compared to UFS? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle Performance - ZFS vs UFS (Jason King)
There is of course the caveat of using raw devices with databases (it becomes harder to track usage, especially as the number of LUNs increases, slightly less visibility into their usage statistics at the OS level ). However perhaps now someone can implement the CR I filed a long time ago to add ASM support to libfstyp.so that would allow zfs, mkfs, format, etc. to identify ASM volumes =) While that would be nice, I would submit that if using ASM, usage becomes solely a DBA problem. From the OS level, as a system admin, I don't really care…I refer any questions back to the DBA. They should have tools to deal with all that. OTOH, with more things stacked on more servers (zones, etc.) I might care if there's a chance of whatever Oracle is doing affecting performance elsewhere. Thoughts? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Sat, 13 Feb 2010, Edward Ned Harvey wrote: kind as to collect samples of iosnoop -Da I would be eternally grateful :-) I'm guessing iosnoop is an opensolaris thing? Is there an equivalent for solaris? Iosnoop is part of the DTrace Toolkit by Brendan Gregg, which does work on Solaris 10. See http://www.brendangregg.com/dtrace.html;. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle Performance - ZFS vs UFS (Jason King)
My problem is when you have 100+ luns divided between OS and DB, keeping track of what's for what can become problematic. It becomes even worse when you start adding luns -- the chance of accidentally grabbing a DB lun instead of one of the new ones is non-trivial (then there's also the chance that your storage guy might make a mistake and give you luns already mapped elsewhere on accident -- which I have seen happen before). And when you're forced to do it at 3am after already working 12 hours that day well safeguards are a good thing. On Sat, Feb 13, 2010 at 2:13 PM, Allen Eastwood mi...@paconet.us wrote: There is of course the caveat of using raw devices with databases (it becomes harder to track usage, especially as the number of LUNs increases, slightly less visibility into their usage statistics at the OS level ). However perhaps now someone can implement the CR I filed a long time ago to add ASM support to libfstyp.so that would allow zfs, mkfs, format, etc. to identify ASM volumes =) While that would be nice, I would submit that if using ASM, usage becomes solely a DBA problem. From the OS level, as a system admin, I don't really care…I refer any questions back to the DBA. They should have tools to deal with all that. OTOH, with more things stacked on more servers (zones, etc.) I might care if there's a chance of whatever Oracle is doing affecting performance elsewhere. Thoughts? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle Performance - ZFS vs UFS (Jason King)
So, one of the tricks I've used in the past is to assign a volname in format as I use luns. Dunno if that's an option with ASM? ZFS seems to blow those away, the last time I looked. -A On Feb 13, 2010, at 14:32 , Jason King wrote: My problem is when you have 100+ luns divided between OS and DB, keeping track of what's for what can become problematic. It becomes even worse when you start adding luns -- the chance of accidentally grabbing a DB lun instead of one of the new ones is non-trivial (then there's also the chance that your storage guy might make a mistake and give you luns already mapped elsewhere on accident -- which I have seen happen before). And when you're forced to do it at 3am after already working 12 hours that day well safeguards are a good thing. On Sat, Feb 13, 2010 at 2:13 PM, Allen Eastwood mi...@paconet.us wrote: There is of course the caveat of using raw devices with databases (it becomes harder to track usage, especially as the number of LUNs increases, slightly less visibility into their usage statistics at the OS level ). However perhaps now someone can implement the CR I filed a long time ago to add ASM support to libfstyp.so that would allow zfs, mkfs, format, etc. to identify ASM volumes =) While that would be nice, I would submit that if using ASM, usage becomes solely a DBA problem. From the OS level, as a system admin, I don't really care…I refer any questions back to the DBA. They should have tools to deal with all that. OTOH, with more things stacked on more servers (zones, etc.) I might care if there's a chance of whatever Oracle is doing affecting performance elsewhere. Thoughts? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oracle Performance - ZFS vs UFS
Don't use raidz for the raid type - go with a striped set -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Feb 13, 2010, at 10:54 AM, Edward Ned Harvey wrote: Please add some raidz3 tests :-) We have little data on how raidz3 performs. Does this require a specific version of OS? I'm on Solaris 10 10/09, and man zpool doesn't seem to say anything about raidz3 ... I haven't tried using it ... does it exist? Never mind. I have no interest in performance tests for Solaris 10. The code is so old, that it does not represent current ZFS at all. IMHO, if you want to do performance tests, then you need to be on the very latest dev release. Otherwise, the results can't be carried forward to make a difference -- finding performance issues that are already fixed isn't a good use of your time. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs import fails even though all disks are online
The problem has been resolved by Victor. Thank you again for your time and effort yesterday. I don't think I would have ever been able to get my data back without your level of expertise and hands-on approach. As discussed last night, the important data has been backed up already and come Monday I'll be building another OS server which will be hosting a complete duplicate of all the data. A bit more costly than relying on RAIDZ alone, but at least I should never have to tell my wife that our photos, including wedding and honeymoon, are gone forever. This will also give me the opportunity to update server builds without fearing data loss, one of the reasons I was still on 111b. Thank you also to Cindy and Mark for trying to help me. Just having some things to try kept me hoping that there would be a solution. This community rocks and ZFS does too. Marc. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reading ZFS config for an extended period
After around four days the process appeared to have stalled (no audible hard drive activity). I restarted with milestone=none; deleted /etc/zfs/zpool.cache, restarted, and went zpool import tank. (also allowed root login to ssh, so I could make new ssh sessions if required.) Now I can watch the process from on the machine. My present question is how is the DDT stored? I believe the DDT to have around 10M entries for this dataset, as per: DDT-sha256-zap-duplicate: 400478 entries, size 490 on disk, 295 in core DDT-sha256-zap-unique: 10965661 entries, size 381 on disk, 187 in core (taken just previous to the attempt to destroy the dataset) A sample from iopattern shows: %RAN %SEQ COUNTMINMAXAVG KR 1000195512512512 97 1000414512 65536895362 1000261512512512130 1000273512512512136 1000247512512512123 1000297512512512148 1000292512512512146 1000250512512512125 1000274512512512137 1000302512512512151 1000294512512512147 1000308512512512154 982286512512512143 1000270512512512135 1000390512512512195 1000269512512512134 1000251512512512125 1000254512512512127 1000265512512512132 1000283512512512141 As the pool is comprised of 2x 8-disk raidz vdevs, I presume that each element is stored twice (for the raidz redundancy). So around 280 512b read op/s, that's 140 entries per second. Is the import of a semi-broken pool: 1 Reading all the DDT markers for the dataset; or 2 Reading all the DDT markers for the pool; or 3 Reading all of the block markers for the dataset; or 4 Reading all of the block markers for the pool Prior to actually finalising what it needs to do to fix the pool? I'd like to be able to estimate the length of time likely before the import finishes. Or should I tell it to roll back to the last valid txg - ie before the zfs destroy dataset command was issued? (by zpool import -F.) Or is this likely to take as long/longer than the present import/fix? Cheers. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss