Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
On Fri, May 7, 2010 at 4:57 AM, Brandon High wrote: > I believe that the L2ARC behaves the same as a pool with multiple > top-level vdevs. It's not typical striping, where every write goes to > all devices. Writes may go to only one device, or may avoid a device > entirely while using several other. The decision about where to place > data is done at write time, so no fixed width stripes are created at > allocation time. That's nothing to believe or not to believe much. Each write access to the L2ARC devices are grouped and sent in-sequence. Queue is used to sort them out like to larger or fewer chunks to write. L2ARC behaves in a rotor fashion, simply sweeping writes through available space. That's all the magic, nothing very special... Answering to Mike's main question, behavior on failure is quite simple: once some L2ARC device[s] gone, the others will continue to function. Impact: a little performance losing, some time needs to warm them up and sort things out. No serious consequences or data loss here. Take care, folks. -- Kind regards, BM Things, that are stupid at the beginning, rarely ends up wisely. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does ZFS use large memory pages?
Hi Gary, I would not remove this line in /etc/system. We have been combatting this bug for a while now on our ZFS file system running JES Commsuite 7. I would be interested in finding out how you were able to pin point the problem. We seem to have no worries with the system currently, but when the file system gets above 80% we seems to have quite a number of issues, much the same as what you've had in the past, ps and prstats hanging. are you able to tell me the IDR number that you applied? Thanks, Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Heads Up: zil_disable has expired, ceased to be, ...
On 06/05/2010 21:45, Nicolas Williams wrote: On Thu, May 06, 2010 at 03:30:05PM -0500, Wes Felter wrote: On 5/6/10 5:28 AM, Robert Milkowski wrote: sync=disabled Synchronous requests are disabled. File system transactions only commit to stable storage on the next DMU transaction group commit which can be many seconds. Is there a way (short of DTrace) to write() some data and get notified when the corresponding txg is committed? Think of it as a poor man's group commit. fsync(2) is it. Of course, if you disable sync writes then there's no way to find out for sure. If you need to know when a write is durable, then don't disable sync writes. Nico There is one way - issue a sync(2) - even with sync=disabled it will sync all filesystems and then return. Another workaround would be to create a snapshot... However I agree with Nico - if you don't need sync=disabled then don't use it. Someone else mentioned that yet another option like sync=fsync-only would be useful so all would be async but fsync() - but frankly I'm not convinced as it would require a support in your application but at this point you already have a full control of the behavior without need for sync=disabled. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance of the ZIL
On Tue, May 4, 2010 at 11:34 AM, Brandon High wrote: > On Tue, May 4, 2010 at 10:19 AM, Tony MacDoodle > wrote: > > How would one determine if I should have a separate ZIL disk? We are > using > > ZFS as the backend of our Guest Domains boot drives using LDom's. And we > are > > seeing bad/very slow write performance? > > There's a dtrace script that Richard Elling wrote called zilstat.ksh. > It's available at > http://www.richardelling.com/Home/scripts-and-programs-1/zilstat > > I'm not sure what the numbers mean (there's info at the address) but > anything other than lots of 0s indicates that the ZIL is being used. On my workstation, I peg my IOPS when using VirtialBox set to run on zvols. The zilstat line comes back with about 3000 total synchronous writes per 30sec. Which means that my disks are doing about 90 sync IOPS write. That is about the upper limit for 7200rpm disks ( from what I understand ). This 3000 number doesn't really change much over time when running with IO load. Disabling the ZIL, I get much better performance, in terms of IO throughput. This tells me that the ZIL is the bottleneck. I will be getting an SSD soon. -- Marc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
On May 6, 2010, at 11:08 AM, Michael Sullivan wrote: > Well, if you are striping over multiple devices the you I/O should be spread > over the devices and you should be reading them all simultaneously rather > than just accessing a single device. Traditional striping would give 1/n > performance improvement rather than 1/1 where n is the number of disks the > stripe is spread across. In theory, for bandwidth, yes, striping does improve by N. For latency, striping adds little, and in some cases is worse. ZFS dynamic stripe tries to balance this tradeoff towards latency for HDDs by grouping blocks so that only one seek+rotate is required. More below... > The round-robin access I am referring to, is the way the L2ARC vdevs appear > to be accessed. RAID-0 striping is also round-robin. > So, any given object will be taken from a single device rather than from > several devices simultaneously, thereby increasing the I/O throughput. So, > theoretically, a stripe spread over 4 disks would give 4 times the > performance as opposed to reading from a single disk. This also assumes the > controller can handle multiple I/O as well or that you are striped over > different disk controllers for each disk in the stripe. All modern controllers handled multiple, concurrent I/O. > SSD's are fast, but if I can read a block from more devices simultaneously, > it will cut the latency of the overall read. OTOH, if you have to wait for N HDDs to seek+rotate, then the latency is that of the slowest disk. The classic analogy is: nine women cannot produce a baby in one month. The difference is: ZFS dynamic stripe: latency per I/O = fixed latency of one vdev + (size / min(media bandwidth, path bandwidth)) RAID-0: latency per I/O = max(fixed latency of devices) + (size / min((media bandwidth / N), path bandwidth)) For HDDs, the media bandwidth is around 100 MB/sec for many devices, far less than the path bandwidth on a modern system. For many SSDs, the media bandwidth is close to the path bandwidth. Newer SSDs have media bandwidth > 3Gbps, but 6Gbps SAS is becoming readily available. In other words, if the path bandwidth isn't a problem, and the media bandwidth of an SSD is 3x that of a HDD, then the bandwidth requirement that dictated RAID-0 for HDDs is reduced by a factor of 3. Yet another reason why HDDs lost the performance battle. This is also why not many folks choose to use HDDs for L2ARC -- the latency gain over the pool is marginal for HDDs. This is also one reason why there is no concatenation in ZFS. -- richard -- ZFS storage and performance consulting at http://www.RichardElling.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup ration for iscsi-shared zfs dataset
Hi-- Even though the dedup property can be set on a file system basis, dedup space usage is accounted for from the pool level by using zpool list command. My non-expert opinion is that it would be near impossible to report space usage for dedup and non-dedup file systems at the file system level. More details are in the ZFS Dedup FAQ: http://hub.opensolaris.org/bin/view/Community+Group+zfs/dedup Thanks, Cindy On 05/06/10 12:31, eXeC001er wrote: Hi. How can i get this info? Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Heads Up: zil_disable has expired, ceased to be, ...
On Thu, May 06, 2010 at 03:30:05PM -0500, Wes Felter wrote: > On 5/6/10 5:28 AM, Robert Milkowski wrote: > > >sync=disabled > >Synchronous requests are disabled. File system transactions > >only commit to stable storage on the next DMU transaction group > >commit which can be many seconds. > > Is there a way (short of DTrace) to write() some data and get > notified when the corresponding txg is committed? Think of it as a > poor man's group commit. fsync(2) is it. Of course, if you disable sync writes then there's no way to find out for sure. If you need to know when a write is durable, then don't disable sync writes. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Heads Up: zil_disable has expired, ceased to be, ...
On 5/6/10 5:28 AM, Robert Milkowski wrote: sync=disabled Synchronous requests are disabled. File system transactions only commit to stable storage on the next DMU transaction group commit which can be many seconds. Is there a way (short of DTrace) to write() some data and get notified when the corresponding txg is committed? Think of it as a poor man's group commit. Wes Felter ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] why both dedup and compression?
On Fri, 2010-05-07 at 03:10 +0900, Michael Sullivan wrote: > This is interesting, but what about iSCSI volumes for virtual machines? > > Compress or de-dupe? Assuming the virtual machine was made from a clone of > the original iSCSI or a master iSCSI volume. > > Does anyone have any real world data this? I would think the iSCSI volumes > would diverge quite a bit over time even with compression and/or > de-duplication. > > Just curious… > VM OS storage is an ideal candidate for dedup, and NOT compression (for the most part). VM images contain large quantities of executable files, most of which compress poorly, if at all. However, having 20 copies of the same Window2003 VM image makes for very nice dedup. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
On 06/05/2010 19:08, Michael Sullivan wrote: Hi Marc, Well, if you are striping over multiple devices the you I/O should be spread over the devices and you should be reading them all simultaneously rather than just accessing a single device. Traditional striping would give 1/n performance improvement rather than 1/1 where n is the number of disks the stripe is spread across. The round-robin access I am referring to, is the way the L2ARC vdevs appear to be accessed. So, any given object will be taken from a single device rather than from several devices simultaneously, thereby increasing the I/O throughput. So, theoretically, a stripe spread over 4 disks would give 4 times the performance as opposed to reading from a single disk. This also assumes the controller can handle multiple I/O as well or that you are striped over different disk controllers for each disk in the stripe. SSD's are fast, but if I can read a block from more devices simultaneously, it will cut the latency of the overall read. Keep in mind that the largest block is currently 128KB and you always need to read an entire block. Splitting a block across several L2ARC devices would probably decrease performance and would invalidate all blocks if only a single l2arc device would die. Additionally having each block only on one l2arc device allows to read from all of l2arc devices at the same time. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup ration for iscsi-shared zfs dataset
On Thu, May 6, 2010 at 11:31 AM, eXeC001er wrote: > How can i get this info? $ man zpool $ zpool list NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT rpool 111G 15.5G 95.5G13% 1.00x ONLINE - tank 7.25T 3.16T 4.09T43% 1.12x ONLINE - $ zpool get dedupratio tank NAME PROPERTYVALUE SOURCE tank dedupratio 1.12x - -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
On Thu, May 6, 2010 at 11:08 AM, Michael Sullivan wrote: > The round-robin access I am referring to, is the way the L2ARC vdevs appear > to be accessed. So, any given object will be taken from a single device > rather than from several devices simultaneously, thereby increasing the I/O > throughput. So, theoretically, a stripe spread over 4 disks would give 4 I believe that the L2ARC behaves the same as a pool with multiple top-level vdevs. It's not typical striping, where every write goes to all devices. Writes may go to only one device, or may avoid a device entirely while using several other. The decision about where to place data is done at write time, so no fixed width stripes are created at allocation time. In your example, if the file had at least four blocks there is a likelihood that it will be spread across the four top-level vdevs. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore
Hi Bob, You can review the latest Solaris 10 and OpenSolaris release dates here: http://www.oracle.com/ocom/groups/public/@ocom/documents/webcontent/059542.pdf Solaris 10 release, CY2010 OpenSolaris release, 1st half CY2010 Thanks, Cindy On 05/05/10 18:03, Bob Friesenhahn wrote: On Wed, 5 May 2010, Ray Van Dolson wrote: From a zfs standpoint, Solaris 10 does not seem to be behind the currently supported OpenSolaris release. Well, being able to remove ZIL devices is one important feature missing. Hopefully in U9. :) While the development versions of OpenSolaris are clearly well beyond Solaris 10, I don't believe that the supported version of OpenSolaris (a year old already) has this feature yet either and Solaris 10 has been released several times since then already. When the forthcoming OpenSolaris release emerges in 2011, the situation will be far different. Solaris 10 can then play catch-up with the release of U9 in 2012. Bob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
On Fri, 7 May 2010, Michael Sullivan wrote: Well, if you are striping over multiple devices the you I/O should be spread over the devices and you should be reading them all simultaneously rather than just accessing a single device. Traditional striping would give 1/n performance improvement rather than 1/1 where n is the number of disks the stripe is spread across. This is true. Use of mirroring also improves performance since a mirror multiplies the read performance for the same data. The value of the various approaches likely depends on the total size of the working set and the number of simultaneous requests. Currently available L2ARC SSD devices are very good with a high number of I/Os, but they are quite a bottleneck for bulk reads as compared to L1ARC in RAM . Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] dedup ration for iscsi-shared zfs dataset
Hi. How can i get this info? Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
On Thu, May 6, 2010 at 1:18 AM, Edward Ned Harvey wrote: > > From the information I've been reading about the loss of a ZIL device, > What the heck? Didn't I just answer that question? > I know I said this is answered in ZFS Best Practices Guide. > > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Sepa > rate_Log_Devices > > Prior to pool version 19, if you have an unmirrored log device that fails, > your whole pool is permanently lost. > Prior to pool version 19, mirroring the log device is highly recommended. > In pool version 19 or greater, if an unmirrored log device fails during > operation, the system reverts to the default behavior, using blocks from > the > main storage pool for the ZIL, just as if the log device had been > gracefully > removed via the "zpool remove" command. > This week I've had a bad experience replacing a SSD device that was in a hardware RAID-1 volume. While rebuilding, the source SSD failed and the volume was brought off-line by the controller. The server kept working just fine but seemed to have switched from the 30-second interval to all writes going directly to the disks. I could confirm this with iostat. We've had some compatibility issues between LSI MegaRAID cards and a few MTRON SSDs and I didn't believe the SSD had really died. So I brought it off-line and back on-line and everything started to work. ZFS showed the log device c3t1d0 as removed. After the RAID-1 volume was back I replaced that device with itself and a resilver process started. I don't know what it was resilvering against but it took 2h10min. I should have probably tried a zpool offline/online too. So I think if a log device fails AND you've to import your pool later (server rebooted, etc)... then you lost your pool (prior to version 19). Right ? This happened on OpenSolaris 2009.6. -- Giovanni ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] why both dedup and compression?
This is interesting, but what about iSCSI volumes for virtual machines? Compress or de-dupe? Assuming the virtual machine was made from a clone of the original iSCSI or a master iSCSI volume. Does anyone have any real world data this? I would think the iSCSI volumes would diverge quite a bit over time even with compression and/or de-duplication. Just curious… On 6 May 2010, at 16:39 , Peter Tribble wrote: > On Thu, May 6, 2010 at 2:06 AM, Richard Jahnel wrote: >> I've googled this for a bit, but can't seem to find the answer. >> >> What does compression bring to the party that dedupe doesn't cover already? > > Compression will reduce the storage requirements for non-duplicate data. > > As an example, I have a system that I rsync the web application data > from a whole > bunch of servers (zones) to. There's a fair amount of duplication in > the application > files (java, tomcat, apache, and the like) so dedup is a big win. On > the other hand, > there's essentially no duplication whatsoever in the log files, which > are pretty big, > but compress really well. So having both enabled works really well. > > -- > -Peter Tribble > http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
Hi Marc, Well, if you are striping over multiple devices the you I/O should be spread over the devices and you should be reading them all simultaneously rather than just accessing a single device. Traditional striping would give 1/n performance improvement rather than 1/1 where n is the number of disks the stripe is spread across. The round-robin access I am referring to, is the way the L2ARC vdevs appear to be accessed. So, any given object will be taken from a single device rather than from several devices simultaneously, thereby increasing the I/O throughput. So, theoretically, a stripe spread over 4 disks would give 4 times the performance as opposed to reading from a single disk. This also assumes the controller can handle multiple I/O as well or that you are striped over different disk controllers for each disk in the stripe. SSD's are fast, but if I can read a block from more devices simultaneously, it will cut the latency of the overall read. On 7 May 2010, at 02:57 , Marc Nicholas wrote: > Hi Michael, > > What makes you think striping the SSDs would be faster than round-robin? > > -marc > > On Thu, May 6, 2010 at 1:09 PM, Michael Sullivan > wrote: > Everyone, > > Thanks for the help. I really appreciate it. > > Well, I actually walked through the source code with an associate today and > we found out how things work by looking at the code. > > It appears that L2ARC is just assigned in round-robin fashion. If a device > goes offline, then it goes to the next and marks that one as offline. The > failure to retrieve the requested object is treated like a cache miss and > everything goes along its merry way, as far as we can tell. > > I would have hoped it to be different in some way. Like if the L2ARC was > striped for performance reasons, that would be really cool and using that > device as an extension of the VM model it is modeled after. Which would mean > using the L2ARC as an extension of the virtual address space and striping it > to make it more efficient. Way cool. If it took out the bad device and > reconfigured the stripe device, that would be even way cooler. Replacing it > with a hot spare more cool too. However, it appears from the source code > that the L2ARC is just a (sort of) jumbled collection of ZFS objects. Yes, > it gives you better performance if you have it, but it doesn't really use it > in a way you might expect something as cool as ZFS does. > > I understand why it is read only, and it invalidates it's cache when a write > occurs, to be expected for any object written. > > If an object is not there because of a failure or because it has been removed > from the cache, it is treated as a cache miss, all well and good - go fetch > from the pool. > > I also understand why the ZIL is important and that it should be mirrored if > it is to be on a separate device. Though I'm wondering how it is handled > internally when there is a failure of one of it's default devices, but then > again, it's on a regular pool and should be redundant enough, only just some > degradation in speed. > > Breaking these devices out from their default locations is great for > performance, and I understand. I just wish the knowledge of how they work > and their internal mechanisms were not so much of a black box. Maybe that is > due to the speed at which ZFS is progressing and the features it adds with > each subsequent release. > > Overall, I am very impressed with ZFS, its flexibility and even more so, it's > breaking all the rules about how storage should be managed and I really like > it. I have yet to see anything to come close in its approach to disk data > management. Let's just hope it keeps moving forward, it is truly a unique > way to view disk storage. > > Anyway, sorry for the ramble, but to everyone, thanks again for the answers. > > Mike > > --- > Michael Sullivan > michael.p.sulli...@me.com > http://www.kamiogi.net/ > Japan Mobile: +81-80-3202-2599 > US Phone: +1-561-283-2034 > > On 7 May 2010, at 00:00 , Robert Milkowski wrote: > > > On 06/05/2010 15:31, Tomas Ögren wrote: > >> On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes: > >> > >> > >>> On Wed, 5 May 2010, Edward Ned Harvey wrote: > >>> > In the L2ARC (cache) there is no ability to mirror, because cache device > removal has always been supported. You can't mirror a cache device, > because > you don't need it. > > >>> How do you know that I don't need it? The ability seems useful to me. > >>> > >> The gain is quite minimal.. If the first device fails (which doesn't > >> happen too often I hope), then it will be read from the normal pool once > >> and then stored in ARC/L2ARC again. It just behaves like a cache miss > >> for that specific block... If this happens often enough to become a > >> performance problem, then you should throw away that L2ARC device > >> because it's broken beyond usability. > >> > >> > > > >
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
Hi Michael, What makes you think striping the SSDs would be faster than round-robin? -marc On Thu, May 6, 2010 at 1:09 PM, Michael Sullivan wrote: > Everyone, > > Thanks for the help. I really appreciate it. > > Well, I actually walked through the source code with an associate today and > we found out how things work by looking at the code. > > It appears that L2ARC is just assigned in round-robin fashion. If a device > goes offline, then it goes to the next and marks that one as offline. The > failure to retrieve the requested object is treated like a cache miss and > everything goes along its merry way, as far as we can tell. > > I would have hoped it to be different in some way. Like if the L2ARC was > striped for performance reasons, that would be really cool and using that > device as an extension of the VM model it is modeled after. Which would > mean using the L2ARC as an extension of the virtual address space and > striping it to make it more efficient. Way cool. If it took out the bad > device and reconfigured the stripe device, that would be even way cooler. > Replacing it with a hot spare more cool too. However, it appears from the > source code that the L2ARC is just a (sort of) jumbled collection of ZFS > objects. Yes, it gives you better performance if you have it, but it > doesn't really use it in a way you might expect something as cool as ZFS > does. > > I understand why it is read only, and it invalidates it's cache when a > write occurs, to be expected for any object written. > > If an object is not there because of a failure or because it has been > removed from the cache, it is treated as a cache miss, all well and good - > go fetch from the pool. > > I also understand why the ZIL is important and that it should be mirrored > if it is to be on a separate device. Though I'm wondering how it is handled > internally when there is a failure of one of it's default devices, but then > again, it's on a regular pool and should be redundant enough, only just some > degradation in speed. > > Breaking these devices out from their default locations is great for > performance, and I understand. I just wish the knowledge of how they work > and their internal mechanisms were not so much of a black box. Maybe that > is due to the speed at which ZFS is progressing and the features it adds > with each subsequent release. > > Overall, I am very impressed with ZFS, its flexibility and even more so, > it's breaking all the rules about how storage should be managed and I really > like it. I have yet to see anything to come close in its approach to disk > data management. Let's just hope it keeps moving forward, it is truly a > unique way to view disk storage. > > Anyway, sorry for the ramble, but to everyone, thanks again for the > answers. > > Mike > > --- > Michael Sullivan > michael.p.sulli...@me.com > http://www.kamiogi.net/ > Japan Mobile: +81-80-3202-2599 > US Phone: +1-561-283-2034 > > On 7 May 2010, at 00:00 , Robert Milkowski wrote: > > > On 06/05/2010 15:31, Tomas Ögren wrote: > >> On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes: > >> > >> > >>> On Wed, 5 May 2010, Edward Ned Harvey wrote: > >>> > In the L2ARC (cache) there is no ability to mirror, because cache > device > removal has always been supported. You can't mirror a cache device, > because > you don't need it. > > >>> How do you know that I don't need it? The ability seems useful to me. > >>> > >> The gain is quite minimal.. If the first device fails (which doesn't > >> happen too often I hope), then it will be read from the normal pool once > >> and then stored in ARC/L2ARC again. It just behaves like a cache miss > >> for that specific block... If this happens often enough to become a > >> performance problem, then you should throw away that L2ARC device > >> because it's broken beyond usability. > >> > >> > > > > Well if a L2ARC device fails there might be an unacceptable drop in > delivered performance. > > If it were mirrored than a drop usually would be much smaller or there > could be no drop if a mirror had an option to read only from one side. > > > > Being able to mirror L2ARC might especially be useful once a persistent > L2ARC is implemented as after a node restart or a resource failover in a > cluster L2ARC will be kept warm. Then the only thing which might affect L2 > performance considerably would be a L2ARC device failure... > > > > > > -- > > Robert Milkowski > > http://milek.blogspot.com > > > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/list
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
Everyone, Thanks for the help. I really appreciate it. Well, I actually walked through the source code with an associate today and we found out how things work by looking at the code. It appears that L2ARC is just assigned in round-robin fashion. If a device goes offline, then it goes to the next and marks that one as offline. The failure to retrieve the requested object is treated like a cache miss and everything goes along its merry way, as far as we can tell. I would have hoped it to be different in some way. Like if the L2ARC was striped for performance reasons, that would be really cool and using that device as an extension of the VM model it is modeled after. Which would mean using the L2ARC as an extension of the virtual address space and striping it to make it more efficient. Way cool. If it took out the bad device and reconfigured the stripe device, that would be even way cooler. Replacing it with a hot spare more cool too. However, it appears from the source code that the L2ARC is just a (sort of) jumbled collection of ZFS objects. Yes, it gives you better performance if you have it, but it doesn't really use it in a way you might expect something as cool as ZFS does. I understand why it is read only, and it invalidates it's cache when a write occurs, to be expected for any object written. If an object is not there because of a failure or because it has been removed from the cache, it is treated as a cache miss, all well and good - go fetch from the pool. I also understand why the ZIL is important and that it should be mirrored if it is to be on a separate device. Though I'm wondering how it is handled internally when there is a failure of one of it's default devices, but then again, it's on a regular pool and should be redundant enough, only just some degradation in speed. Breaking these devices out from their default locations is great for performance, and I understand. I just wish the knowledge of how they work and their internal mechanisms were not so much of a black box. Maybe that is due to the speed at which ZFS is progressing and the features it adds with each subsequent release. Overall, I am very impressed with ZFS, its flexibility and even more so, it's breaking all the rules about how storage should be managed and I really like it. I have yet to see anything to come close in its approach to disk data management. Let's just hope it keeps moving forward, it is truly a unique way to view disk storage. Anyway, sorry for the ramble, but to everyone, thanks again for the answers. Mike --- Michael Sullivan michael.p.sulli...@me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 On 7 May 2010, at 00:00 , Robert Milkowski wrote: > On 06/05/2010 15:31, Tomas Ögren wrote: >> On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes: >> >> >>> On Wed, 5 May 2010, Edward Ned Harvey wrote: >>> In the L2ARC (cache) there is no ability to mirror, because cache device removal has always been supported. You can't mirror a cache device, because you don't need it. >>> How do you know that I don't need it? The ability seems useful to me. >>> >> The gain is quite minimal.. If the first device fails (which doesn't >> happen too often I hope), then it will be read from the normal pool once >> and then stored in ARC/L2ARC again. It just behaves like a cache miss >> for that specific block... If this happens often enough to become a >> performance problem, then you should throw away that L2ARC device >> because it's broken beyond usability. >> >> > > Well if a L2ARC device fails there might be an unacceptable drop in delivered > performance. > If it were mirrored than a drop usually would be much smaller or there could > be no drop if a mirror had an option to read only from one side. > > Being able to mirror L2ARC might especially be useful once a persistent L2ARC > is implemented as after a node restart or a resource failover in a cluster > L2ARC will be kept warm. Then the only thing which might affect L2 > performance considerably would be a L2ARC device failure... > > > -- > Robert Milkowski > http://milek.blogspot.com > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
On Wed, May 5, 2010 at 8:47 PM, Michael Sullivan wrote: > While it explains how to implement these, there is no information regarding > failure of a device in a striped L2ARC set of SSD's. I have been hard > pressed to find this information anywhere, short of testing it myself, but I > don't have the necessary hardware in a lab to test correctly. If someone has > pointers to references, could you please provide them to chapter and verse, > rather than the advice to "Go read the manual." Yes, but the answer is in the man page. So reading it is a good idea: "If a read error is encountered on a cache device, that read I/O is reissued to the original storage pool device, which might be part of a mirrored or raidz configuration." > I'm running 2009.11 which is the latest OpenSolaris. I should have made that > clear, and that I don't intend this to be on Solaris 10 system, and am > waiting for the next production build anyway. As you say, it does not exist > in 2009.06, this is not the latest production Opensolaris which is 2009.11, > and I'd be more interested in its behavior than an older release. The "latest" is b134, which contains many, many fixes over 2009.11, though it's a dev release. > From the information I've been reading about the loss of a ZIL device, it > will be relocated to the storage pool it is assigned to. I'm not sure which > version this is in, but it would be nice if someone could provide the release > number it is included in (and actually works), it would be nice. Also, will > this functionality be included in the mythical 2010.03 release? It's went into somewhere around b118 I think, so it will be in the next scheduled release. > Also, I'd be interested to know what features along these lines will be > available in 2010.03 if it ever sees the light of day. Look at the latest dev release. b134 was originally slated to be 2010.03, so the feature set of the final release should be very close. > So what you are saying is that if a single device fails in a striped L2ARC > VDEV, then the entire VDEV is taken offline and the fallback is to simply use > the regular ARC and fetch from the pool whenever there is a cache miss. The strict interpretation of the documentation is that the read is re-issued. My understanding is that the block that failed to be read would then be read from the original pool. > Or, does what you are saying here mean that if I have a 4 SSD's in a stripe > for my L2ARC, and one device fails, the L2ARC will be reconfigured > dynamically using the remaining SSD's for L2ARC. Auto-healing in zfs would resilver the block that failed to be read, either onto the same device or another cache device in the pool, exactly as if a read failed on a normal pool device. It wouldn't reconfigure the cache devices, but each failed read would cause the blocks to be reallocated to a functioning device which has the same effect in the end. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
On 06/05/2010 15:31, Tomas Ögren wrote: On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes: On Wed, 5 May 2010, Edward Ned Harvey wrote: In the L2ARC (cache) there is no ability to mirror, because cache device removal has always been supported. You can't mirror a cache device, because you don't need it. How do you know that I don't need it? The ability seems useful to me. The gain is quite minimal.. If the first device fails (which doesn't happen too often I hope), then it will be read from the normal pool once and then stored in ARC/L2ARC again. It just behaves like a cache miss for that specific block... If this happens often enough to become a performance problem, then you should throw away that L2ARC device because it's broken beyond usability. Well if a L2ARC device fails there might be an unacceptable drop in delivered performance. If it were mirrored than a drop usually would be much smaller or there could be no drop if a mirror had an option to read only from one side. Being able to mirror L2ARC might especially be useful once a persistent L2ARC is implemented as after a node restart or a resource failover in a cluster L2ARC will be kept warm. Then the only thing which might affect L2 performance considerably would be a L2ARC device failure... -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS - USB 3.0 SSD disk
Hi all, It seems like the market has yet another type of ssd device, this time a USB 3.0 portable SSD device by OCZ. Going on the specs it seems to me that if this device has a good price it might be quite useful for caching purposes on ZFS based storage. Take a look at http://www.ocztechnology.com/products/solid-state-drives/usb-3-0-/ocz-enyo-usb-3-0-portable-solid-state-drive.html and we need to wait for prices and for systems with USB 3.0 :) P.S PCIE 3.0 is near by ..iupi! Bruno -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes: > On Wed, 5 May 2010, Edward Ned Harvey wrote: >> >> In the L2ARC (cache) there is no ability to mirror, because cache device >> removal has always been supported. You can't mirror a cache device, because >> you don't need it. > > How do you know that I don't need it? The ability seems useful to me. The gain is quite minimal.. If the first device fails (which doesn't happen too often I hope), then it will be read from the normal pool once and then stored in ARC/L2ARC again. It just behaves like a cache miss for that specific block... If this happens often enough to become a performance problem, then you should throw away that L2ARC device because it's broken beyond usability. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Loss of L2ARC SSD Behaviour
On Wed, 5 May 2010, Edward Ned Harvey wrote: In the L2ARC (cache) there is no ability to mirror, because cache device removal has always been supported. You can't mirror a cache device, because you don't need it. How do you know that I don't need it? The ability seems useful to me. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance of the ZIL
On May 6, 2010, at 8:34 AM, Edward Ned Harvey wrote: From: Pasi Kärkkäinen [mailto:pa...@iki.fi] In neither case do you have data or filesystem corruption. ZFS probably is still OK, since it's designed to handle this (?), but the data can't be OK if you lose 30 secs of writes.. 30 secs of writes that have been ack'd being done to the servers/applications.. What I meant was: Yes there's data loss. But no corruption. In other filesystems, if you have an ungraceful shutdown while the filesystem is writing, since filesystems such as EXT3 perform file-based (or inode- based) block write operations, then you can have files whose contents have been corrupted... Some sectors of the file still in their "old" state, and some sectors of the file in their "new" state. Likewise, in something like EXT3, you could have some file fully written, while another one hasn't been written yet, but should have been. (AKA, some files written out of order.) In the case of EXT3, since it is a journaled filesystem, the journal only keeps the *filesystem* consistent after a crash. It's still possible to have corrupted data in the middle of a file. I believe ext3 has an option to journal data as well as metadata, it just defaults to metadata. I don't believe out-of-order writes are so much an issue any more since Linux gained write barrier support (and most file systems and block devices now support it). These things don't happen in ZFS. ZFS takes journaling to a whole new level. Instead of just keeping your filesystem consistent, it also keeps your data consistent. Yes, data loss is possible when a system crashes, but the filesystem will never have any corruption. These are separate things now, and never were before. ZFS does NOT have a journal, it has an intent log which is completely different. A journal logs operations that are to be performed later (the journal is read, the operation performed) an intent log logs operations that are being performed now, when the disk flushes the intent entry is marked complete. ZFS is consistent by the nature of COW which means a partial write will not become part of the file system (the old block pointer isn't updated till the new block completes the write). In ZFS, losing n-seconds of writes leading up to the crash will never result in files partially written, or written out of order. Every atomic write to the filesystem results in a filesystem-consistent and data- consistent view of *some* valid form of all the filesystem and data within it. ZFS file system will always be consistent, but if an application doesn't flush it's data, then it can definitely have partially written data. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance of the ZIL
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Ragnar Sundblad > > But if you have an application, protocol and/or user that demands > or expects persistant storage, disabling ZIL of course could be fatal > in case of a crash. Examples are mail servers and NFS servers. Basically, anything which writes to disk based on requests from something across a network. Because if your system goes down and comes back up, thinking itself is consistent, but there's one client thinking "A" and another client thinking "B" ... even though your server is consistent, the world isn't. Another great example would be if your server handles credit card transactions. If a user clicks "buy now" in a web interface, and the server contacts Visa or MasterCard, records the transaction, and then crashes before it records the transaction to its own disks ... Then the server would come up and have no recollection of that transaction. But the user, and Visa/Mastercard certainly would remember it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance of the ZIL
> From: Pasi Kärkkäinen [mailto:pa...@iki.fi] > > > In neither case do you have data or filesystem corruption. > > > > ZFS probably is still OK, since it's designed to handle this (?), > but the data can't be OK if you lose 30 secs of writes.. 30 secs of > writes > that have been ack'd being done to the servers/applications.. What I meant was: Yes there's data loss. But no corruption. In other filesystems, if you have an ungraceful shutdown while the filesystem is writing, since filesystems such as EXT3 perform file-based (or inode-based) block write operations, then you can have files whose contents have been corrupted... Some sectors of the file still in their "old" state, and some sectors of the file in their "new" state. Likewise, in something like EXT3, you could have some file fully written, while another one hasn't been written yet, but should have been. (AKA, some files written out of order.) In the case of EXT3, since it is a journaled filesystem, the journal only keeps the *filesystem* consistent after a crash. It's still possible to have corrupted data in the middle of a file. These things don't happen in ZFS. ZFS takes journaling to a whole new level. Instead of just keeping your filesystem consistent, it also keeps your data consistent. Yes, data loss is possible when a system crashes, but the filesystem will never have any corruption. These are separate things now, and never were before. In ZFS, losing n-seconds of writes leading up to the crash will never result in files partially written, or written out of order. Every atomic write to the filesystem results in a filesystem-consistent and data-consistent view of *some* valid form of all the filesystem and data within it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Heads Up: zil_disable has expired, ceased to be, ...
On Thu, May 06, 2010 at 01:15:41PM +0100, Robert Milkowski wrote: > On 06/05/2010 13:12, Robert Milkowski wrote: > >On 06/05/2010 12:24, Pawel Jakub Dawidek wrote: > >>I read that this property is not inherited and I can't see why. > >>If what I read is up-to-date, could you tell why? > > > >It is inherited. Sorry for the confusion but there was a discussion if > >it should or should not be inherited, then we propose that it > >shouldn't but it was changed again during a PSARC review that it should. > > > >And I did a copy'n'paste here. > > > >Again, sorry for the confusion. > > > Well, actually I did copy'n'paste a proper page as it doesn't say > anything about inheritance. > > Nevertheless, yes it is inherited. Yes, your e-mail didn't mention that and I wanted to clarify if what I read in PSARC changed or not. Thanks:) -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp3bNocGiTgs.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Heads Up: zil_disable has expired, ceased to be, ...
On 06/05/2010 13:12, Robert Milkowski wrote: On 06/05/2010 12:24, Pawel Jakub Dawidek wrote: I read that this property is not inherited and I can't see why. If what I read is up-to-date, could you tell why? It is inherited. Sorry for the confusion but there was a discussion if it should or should not be inherited, then we propose that it shouldn't but it was changed again during a PSARC review that it should. And I did a copy'n'paste here. Again, sorry for the confusion. Well, actually I did copy'n'paste a proper page as it doesn't say anything about inheritance. Nevertheless, yes it is inherited. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Heads Up: zil_disable has expired, ceased to be, ...
On 06/05/2010 12:24, Pawel Jakub Dawidek wrote: I read that this property is not inherited and I can't see why. If what I read is up-to-date, could you tell why? It is inherited. Sorry for the confusion but there was a discussion if it should or should not be inherited, then we propose that it shouldn't but it was changed again during a PSARC review that it should. And I did a copy'n'paste here. Again, sorry for the confusion. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [indiana-discuss] image-update doesn't work anymore (bootfs not supported on EFI)
On Wed, 2010-05-05 at 09:45 -0600, Evan Layton wrote: > No that doesn't appear like an EFI label. So it appears that ZFS > is seeing something there that it's interpreting as an EFI label. > Then the command to set the bootfs property is failing due to that. > > To restate the problem the BE can't be activated because we can't set > the bootfs property of the root pool and even the ZFS command to set > it fails with "property 'bootfs' not supported on EFI labeled devices" > > for example the following command: > # zfs set bootfs=rpool/ROOT/opensolaris rpool > > fails with that same error message. I guess you mean zpool, but yes: # zpool set bootfs=rpool/ROOT/opensolaris-138 rpool cannot set property for 'rpool': property 'bootfs' not supported on EFI labeled devices > > Do you have any of the older BEs like build 134 that you can boot back > to and see if those will allow you to set the bootfs property on the > root pool? It's just really strange that out of nowhere it started > thinking that the device is EFI labeled. I have a couple of BEs I could boot to: $ beadm list BE Active Mountpoint Space Policy Created -- -- -- - -- --- opensolaris - - 1.00G static 2009-10-01 08:00 opensolaris-124 - - 20.95M static 2009-10-03 13:30 opensolaris-125 - - 30.00M static 2009-10-17 15:18 opensolaris-126 - - 25.33M static 2009-10-29 20:18 opensolaris-127 - - 1.37G static 2009-11-14 13:20 opensolaris-128 - - 1.91G static 2009-12-04 14:28 opensolaris-129 - - 22.49M static 2009-12-12 11:31 opensolaris-130 - - 21.64M static 2009-12-26 19:46 opensolaris-131 - - 24.72M static 2010-01-22 22:51 opensolaris-132 - - 57.32M static 2010-02-09 23:05 opensolaris-133 - - 1.07G static 2010-02-20 12:55 opensolaris-134 N / 43.17G static 2010-03-08 21:58 opensolaris-138 R - 1.81G static 2010-05-04 12:03 I will try on 132 or 133. Get back to you later. -- Christian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another MPT issue - kernel crash
On 5/05/10 10:42 PM, Bruno Sousa wrote: Hi all, I have faced yet another kernel panic that seems to be related to mpt driver. This time i was trying to add a new disk to a running system (snv_134) and this new disk was not being detected...following a tip i ran the lsitool to reset the bus and this lead to a system panic. MPT driver : BAD TRAP: type=e (#pf Page fault) rp=ff001fc98020 addr=4 occurred in module "mpt" due to a NULL pointer dereference If someone has a similar problem it might be worthwhile to expose it here or to add information to the filled bug , available at https://defect.opensolaris.org/bz/show_bug.cgi?id=15879 That's an already-known CR, tracked in Bugster. I've updated defect.o.o and transferred your info to the Bugster CR, 6895862. Until the nightly inside->outside bugs.o.o sync up it'll still show up as closed, but don't worry, I've re-opened it. James C. McPherson -- Senior Software Engineer, Solaris Oracle http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Heads Up: zil_disable has expired, ceased to be, ...
On 06/05/2010 12:24, Pawel Jakub Dawidek wrote: I read that this property is not inherited and I can't see why. If what I read is up-to-date, could you tell why? It is inherited, this changed as a result of the PSARC review. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Heads Up: zil_disable has expired, ceased to be, ...
On Thu, May 06, 2010 at 11:28:37AM +0100, Robert Milkowski wrote: > With the put back of: > > [PSARC/2010/108] zil synchronicity > > zfs datasets now have a new 'sync' property to control synchronous > behaviour. > The zil_disable tunable to turn synchronous requests into asynchronous > requests (disable the ZIL) has been removed. For systems that use that > switch on upgrade > you will now see a message on booting: > > sorry, variable 'zil_disable' is not defined in the 'zfs' module > > Please update your system to use the new sync property. > Here is a summary of the property: > > --- > > The options and semantics for the zfs sync property: > > sync=standard >This is the default option. Synchronous file system transactions >(fsync, O_DSYNC, O_SYNC, etc) are written out (to the intent log) >and then secondly all devices written are flushed to ensure >the data is stable (not cached by device controllers). > > sync=always >For the ultra-cautious, every file system transaction is >written and flushed to stable storage by system call return. >This obviously has a big performance penalty. > > sync=disabled >Synchronous requests are disabled. File system transactions >only commit to stable storage on the next DMU transaction group >commit which can be many seconds. This option gives the >highest performance, with no risk of corrupting the pool. >However, it is very dangerous as ZFS is ignoring the synchronous > transaction >demands of applications such as databases or NFS. >Setting sync=disabled on the currently active root or /var >file system may result in out-of-spec behavior or application data >loss and increased vulnerability to replay attacks. >Administrators should only use this when these risks are understood. > > The property can be set when the dataset is created, or dynamically, > and will take effect immediately. To change the property, an > administrator can use the standard 'zfs' command. For example: > > # zfs create -o sync=disabled whirlpool/milek > # zfs set sync=always whirlpool/perrin I read that this property is not inherited and I can't see why. If what I read is up-to-date, could you tell why? -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpnwVhYvicjy.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirroring USB Drive with Laptop for Backup purposes
Based on comments, some people say nay, some say yah. so I decided to give it a spin, and see how I get on. To make my mirror bootable I followed instructions posted here : http://www.taiter.com/blog/2009/04/opensolaris-200811-adding-disk.html I plan to do a quick write up myself of my own experience, but so far everything is working fine. Mirror size is 200GB (Smallest disk, happens to be laptop disk), once I attached the USB drive, it started resilvering straight away, and only took 1hr 45mins to complete and it resilvered 120G !! This I was very impressed with. So far I've not noticed any system performance degradation with the drive attached. I did a quick test, yanked out the drive, degrades rpool as expected, but system continues to function fine. I also did a quick test to see of the USB drive was indeed bootable, by connecting to another laptop, it booted perfectly. Connecting the USB drive back to original laptop, the pool comes back online and resilvers seamlessly. This is automatic 24/7 backup at it's best... One thing I did notice, I powered down yesterday whilst USB was attached, this morning when booting up, I did so without the USB attached, laptop failed to boot, I had to connect the USB drive and it booted up fine. Key would be to degrade the pool before shutdown, e.g. disconnect USB drive, might try using zpool offline and see how that works. If I encounter issues, I'll post again. cheers Matt On 05/ 5/10 09:34 PM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Matt Keenan Just wondering whether mirroring a USB drive with main laptop disk for backup purposes is recommended or not. Plan would be to connect the USB drive, once or twice a week, let it resilver, and then disconnect again. Connecting USB drive 24/7 would AFAIK have performance issues for the Laptop. MMmmm... If it works, sounds good. But I don't think it'll work as expected, for a number of reasons, outlined below. The suggestion I would have instead, would be to make the external drive its own separate zpool, and then you can incrementally "zfs send | zfs receive" onto the external. Here are the obstacles I think you'll have with your proposed solution: #1 I think all the entire used portion of the filesystem needs to resilver every time. I don't think there's any such thing as an incremental resilver. #2 How would you plan to disconnect the drive? If you zpool detach it, I think it's no longer a mirror, and not mountable. If you simply yank out the plug ... although that might work, it would certainly be nonideal. If you power off, disconnect, and power on ... Again, it should probably be fine, but it's not designed to be used that way intentionally, so your results ... are probably as-yet untested. I don't want to go on. This list could go on forever. I will strongly encourage you to simply use "zfs send | zfs receive" because that's a standard practice thing to do. It is known that the external drive is not bootable this way, but that's why you have this article on how to make it bootable: http://docs.sun.com/app/docs/doc/819-5461/ghzur?l=en&a=view This would have the added benefit of the USB drive being bootable. By default, AFAIK, that's not correct. When you mirror rpool to another device, by default the 2nd device is not bootable, because it's just got an rpool in there. No boot loader. Even if you do this mirror idea, which I believe will be slower and less reliable than "zfs send | zfs receive" you still haven't gained anything as compared to the "zfs send | zfs receive" procedure, which is known to work reliable with optimal performance. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does Opensolaris support thin reclamation?
Please find this thread for further info about this topic : http://www.opensolaris.org/jive/thread.jspa?threadID=120824&start=0&tstart=0 In short, ZFS doesn't support thin reclamation today, although we have RFE open to implement it somewhere in the future. Regards, sendai -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Heads Up: zil_disable has expired, ceased to be, ...
With the put back of: [PSARC/2010/108] zil synchronicity zfs datasets now have a new 'sync' property to control synchronous behaviour. The zil_disable tunable to turn synchronous requests into asynchronous requests (disable the ZIL) has been removed. For systems that use that switch on upgrade you will now see a message on booting: sorry, variable 'zil_disable' is not defined in the 'zfs' module Please update your system to use the new sync property. Here is a summary of the property: --- The options and semantics for the zfs sync property: sync=standard This is the default option. Synchronous file system transactions (fsync, O_DSYNC, O_SYNC, etc) are written out (to the intent log) and then secondly all devices written are flushed to ensure the data is stable (not cached by device controllers). sync=always For the ultra-cautious, every file system transaction is written and flushed to stable storage by system call return. This obviously has a big performance penalty. sync=disabled Synchronous requests are disabled. File system transactions only commit to stable storage on the next DMU transaction group commit which can be many seconds. This option gives the highest performance, with no risk of corrupting the pool. However, it is very dangerous as ZFS is ignoring the synchronous transaction demands of applications such as databases or NFS. Setting sync=disabled on the currently active root or /var file system may result in out-of-spec behavior or application data loss and increased vulnerability to replay attacks. Administrators should only use this when these risks are understood. The property can be set when the dataset is created, or dynamically, and will take effect immediately. To change the property, an administrator can use the standard 'zfs' command. For example: # zfs create -o sync=disabled whirlpool/milek # zfs set sync=always whirlpool/perrin -- Team ZIL. It should be in build 140. For a little bit more information on it you might look at http://milek.blogspot.com/2010/05/zfs-synchronous-vs-asynchronous-io.html -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cannot create snapshot: dataset is busy
On Thu, May 6, 2010 at 1:31 AM, Brandon High wrote: > Any other way to fix it? There's no data in the zvol that I can't > easily reproduce if it needs to be destroyed. I did a rollback to the most recent snapshot, which seems to have worked. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] cannot create snapshot: dataset is busy
I'm unable to snapshot a dataset, receiving the error "dataset is busy". Google and some bug reports suggest it's from a zil that hasn't been completely replayed, and that mounting and unmounting the dataset will fix it. Which is great, except it's a zvol. Any other way to fix it? There's no data in the zvol that I can't easily reproduce if it needs to be destroyed. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance of the ZIL
On 6 maj 2010, at 08.17, Pasi Kärkkäinen wrote: > On Wed, May 05, 2010 at 11:32:23PM -0400, Edward Ned Harvey wrote: >>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >>> boun...@opensolaris.org] On Behalf Of Robert Milkowski >>> >>> if you can disable ZIL and compare the performance to when it is off it >>> will give you an estimate of what's the absolute maximum performance >>> increase (if any) by having a dedicated ZIL device. >> >> I'll second this suggestion. It'll cost you nothing to disable the ZIL >> temporarily. (You have to dismount the filesystem twice. Once to disable >> the ZIL, and once to re-enable it.) Then you can see if performance is >> good. If performance is good, then you'll know you need to accelerate your >> ZIL. (Because disabled ZIL is the fastest thing you could possibly ever >> do.) >> >> Generally speaking, you should not disable your ZIL for the long run. But >> in some cases, it makes sense. >> >> Here's how you determine if you want to disable your ZIL permanently: >> >> First, understand that with the ZIL disabled, all sync writes are treated as >> async writes. This is buffered in ram before being written to disk, so the >> kernel can optimize and aggregate the write operations into one big chunk. >> >> No matter what, if you have an ungraceful system shutdown, you will lose all >> the async writes that were waiting in ram. >> >> If you have ZIL disabled, you will also lose the sync writes that were >> waiting in ram (because those are being handled as async.) >> >> In neither case do you have data or filesystem corruption. >> > > ZFS probably is still OK, since it's designed to handle this (?), > but the data can't be OK if you lose 30 secs of writes.. 30 secs of writes > that have been ack'd being done to the servers/applications.. Entirely right! This is the case for many local user writes anyway, since many applications doesn't sync the written data to disk. But if you have an application, protocol and/or user that demands or expects persistant storage, disabling ZIL of course could be fatal in case of a crash. Examples are mail servers and NFS servers. /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] why both dedup and compression?
On Thu, May 6, 2010 at 2:06 AM, Richard Jahnel wrote: > I've googled this for a bit, but can't seem to find the answer. > > What does compression bring to the party that dedupe doesn't cover already? Compression will reduce the storage requirements for non-duplicate data. As an example, I have a system that I rsync the web application data from a whole bunch of servers (zones) to. There's a fair amount of duplication in the application files (java, tomcat, apache, and the like) so dedup is a big win. On the other hand, there's essentially no duplication whatsoever in the log files, which are pretty big, but compress really well. So having both enabled works really well. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss