Re: [zfs-discuss] Cause for data corruption?
Le lundi 25 février 2008 à 11:05 -0800, Sandro a écrit : hi folks Hi, I've been running my fileserver at home with linux for a couple of years and last week I finally reinstalled it with solaris 10 u4. I borrowed a bunch of disks from a friend, copied over all the files, reinstalled my fileserver and copied the data back. Everything went fine, but after a few days now, quite a lot of files got corrupted. here's the output: # zpool status data pool: data state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed with 422 errors on Mon Feb 25 00:32:18 2008 config: NAMESTATE READ WRITE CKSUM dataONLINE 0 0 5.52K raidz1ONLINE 0 0 5.52K c0t0d0 ONLINE 0 0 10.72 c0t1d0 ONLINE 0 0 4.59K c0t2d0 ONLINE 0 0 5.18K c0t3d0 ONLINE 0 0 9.10K c1t0d0 ONLINE 0 0 7.64K c1t1d0 ONLINE 0 0 3.75K c1t2d0 ONLINE 0 0 4.39K c1t3d0 ONLINE 0 0 6.04K errors: 388 data errors, use '-v' for a list Last night I found out about this, it told me there were errors in like 50 files. So I scrubbed the whole pool and it found a lot more corrupted files. The temporary system which I used to hold the data while I'm installing solaris on my fileserver is running nv build 80 and no errors on there. What could be the cause of these errors?? I don't see any hw errors on my disks.. # iostat -En | grep -i error c3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c0t0d0 Soft Errors: 574 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c1t0d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c0t1d0 Soft Errors: 14 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c0t2d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c0t3d0 Soft Errors: 549 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c1t1d0 Soft Errors: 548 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c1t2d0 Soft Errors: 14 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 c1t3d0 Soft Errors: 548 Hard Errors: 0 Transport Errors: 0 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 although a lot of soft errors. Linux said that one disk had gone bad, but I figured the sata cable was somehow broken, so I replaced that before installing solaris. And solaris didn't and doesn't see any actual hw errors on the disks, does it? I had the same symptoms recently. I also thought the disk were dying but I was wrong. Suspected the RAM, no. Finally it was because I mixed raid cards on different PCI buses : 2 64bits buses (no problem with these ones) and 1 32 Bits PCI bus which caused *all* the checksum errors. Kicked ou the card on the 32 bit PCI bus and all worked fine. Hope it helps, -- Nicolas Szalay Administrateur systèmes réseaux -- _ ASCII ribbon campaign ( ) - against HTML email X vCards / \ signature.asc Description: Ceci est une partie de message numériquement signée ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Preferred backup s/w
Rich Teer [EMAIL PROTECTED] wrote: People who like to backup usually also like to do incremental backups. Why don't you? I do like incremental backups. But the ability to do incremental backups and restore arbitrary files from an archive are two different things. An incremental backup backs up files that have changed since the most recent backup, so suppose my home directory contains 1000 files, 100 of which have changed since my last backup. I perform an incremental backup of my home directory, and the resulting archive contains those 100 files. Now suppose that I accidentally delete a couple of those files; it is very desirable to be able to restore just a certain named subset of the files in an archive rather than having to restore the whole archive. I'm looking for a tool that can do that. Hi Rich, I asked you a question that you did not yet answer: Are you interested only in full backups and in the ability to restore single files from that type of backups? Or are you interested in incremental backups that _also_ allow you to reduce the daily backup size but still gives you the ability to restore single files? I am asking this because there are some backup programs that do not fit into the list above: The Amanda people e.g. call something incremental backup that does not allow you to restore to an empty disk up to the state of the last incremental. Amanda in this case suffers from the problem that GNU tar does not allow you to do a restore on an empty disk if someone did rename directories in a way that triggers the conceptional problems in GNU tar. So it seems to be important to me to first find what kind of backup you are interested in. Please answer my questions! Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Preferred backup s/w
michael schuster [EMAIL PROTECTED] wrote: Rich never said so. He said the ability to do incremental backups and restore arbitrary files from an archive are two different things. You were addressing an issue he never brought up. I really don't understand why you did not answer my question. It is obvious that there is some confusion in the question and it is not possible to continue the discussion if you do not try to help to solve this problem. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Preferred backup s/w
Darren J Moffat [EMAIL PROTECTED] wrote: ZFS discuss is fine but the thread has gone into non ZFS related and is generic backup stuff. If there are ZFS specifics - like the question about extended attributes then I think this is a reasonable place to discuss. Discussion about nomenclature of Amanda when it does not concern ZFS is not appropriate for here. You are welcome to create a mailing list for generic backup stuff The discussion here seems to be started by people who are looking for a backup suitable for ZFS. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The old problem with tar, zfs, nfs and zil
For Linux NFS service, it's a option in /etc/exports. The default for modern (post-1.0.1) NFS utilities is sync, which means that data and metadata will be written to the disk whenever NFS requires it (generally upon an NFS COMMIT operation). This is the same as Solaris with UFS, or with ZFS+ZIL. This works with XFS, EXT3, and any other file system with a working fsync(). Ok, i did know that, i have forgot to mention in my question that my doubt was if Linux would really honour the sync. Do you understand? I did read that Linux does not (even with sync in exports). In nfsv2 for example, does not matter if you put sync or async, the server will ACK as soon as it receives the request (NOP). But if you are telling that *now* Linux is really syncing discs before ACK the client, well... so there is a huge diff on zfs/nfs and xfs/nfs, because the numbers that i have posted is with sync on Linux. It's possible to switch this off on Linux, but not recommended, as there is a chance that data could be lost if the server crashed. (For the same reason, the ZIL should not be disabled on a Solaris NFS server.) I understand that, so i did not even try to disable ZIL until now. All the tests that i have made was respecting a semantically correct NFS service. If the ZIL could be configured per filesystem, or pool... The diff is 7.5s to 1.0s, and theoretically zfs is more efficient than xfs. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] modification to zdb to decompress blocks
Hi All, I have modified zdb to do decompression in zdb_read_block. Syntax is: # zdb -R poolname:devid:blkno:psize:d,compression_type,lsize Where compression_type can be lzjb or any other type compression that zdb uses, and lsize is the size after compression. I have used this with a modified mdb to allow one to do the following: given a pathname for a file on a zfs file system, display the blocks (i.e., data) of the file. The file system need not be mounted. If anyone is interested, send me email. I can send a webrev of the zdb changes for those interested. As for the mdb changes, I sent a webrev of those a while ago, and have since added a rawzfs dmod. I plan to present a paper at osdevcon in Prague in June that uses the modified zdb and mdb to show the physical layout of a zfs file system. (I should mention that, over time, I have found that the ZFS on-disk format paper actually does tell you almost everything). max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The old problem with tar, zfs, nfs and zil
I would imagine that linux to behave more like ZFS that does not flush caches. (google Evil zfs_nocacheflush). If you can nfs tar extract files on linux faster than one file per rotation latency; that is suspicious. -r Le 26 févr. 08 à 13:16, msl a écrit : For Linux NFS service, it's a option in /etc/exports. The default for modern (post-1.0.1) NFS utilities is sync, which means that data and metadata will be written to the disk whenever NFS requires it (generally upon an NFS COMMIT operation). This is the same as Solaris with UFS, or with ZFS+ZIL. This works with XFS, EXT3, and any other file system with a working fsync(). Ok, i did know that, i have forgot to mention in my question that my doubt was if Linux would really honour the sync. Do you understand? I did read that Linux does not (even with sync in exports). In nfsv2 for example, does not matter if you put sync or async, the server will ACK as soon as it receives the request (NOP). But if you are telling that *now* Linux is really syncing discs before ACK the client, well... so there is a huge diff on zfs/nfs and xfs/nfs, because the numbers that i have posted is with sync on Linux. It's possible to switch this off on Linux, but not recommended, as there is a chance that data could be lost if the server crashed. (For the same reason, the ZIL should not be disabled on a Solaris NFS server.) I understand that, so i did not even try to disable ZIL until now. All the tests that i have made was respecting a semantically correct NFS service. If the ZIL could be configured per filesystem, or pool... The diff is 7.5s to 1.0s, and theoretically zfs is more efficient than xfs. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cause for data corruption?
Hey Thanks for your answers guys. I'll run VTS to stresstest cpu and memory. And I just checked the block diagram of my motherboard (Gigabyte M61P-S3). It doesn't even have 64bit pci slots.. just standard old 33mhz 32bit pci .. and a couple of newer pci-e. But my two controllers are both the same vendor / version and are both connected to the same pci bus. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The old problem with tar, zfs, nfs and zil
Actually, i have some corrections to be made. When i did see the numbers, i was stunned and that blocked me to think… Here you can see the right numbers: http://www.posix.brte.com.br/blog/?p=104 The problem was the discs were i have made the tests. Thanks for your time. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Preferred backup s/w
On Tue, 26 Feb 2008, Joerg Schilling wrote: Hi Rich, I asked you a question that you did not yet answer: Hi Jörg, Are you interested only in full backups and in the ability to restore single files from that type of backups? Or are you interested in incremental backups that _also_ allow you to reduce the daily backup size but still gives you the ability to restore single files? Both: I'd like to be able to restore single files from both a full and incremental backup of a ZFS file system. -- Rich Teer, SCSA, SCNA, SCSECA, OGB member CEO, My Online Home Inventory URLs: http://www.rite-group.com/rich http://www.linkedin.com/in/richteer http://www.myonlinehomeinventory.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Preferred backup s/w
Rich Teer [EMAIL PROTECTED] wrote: Are you interested only in full backups and in the ability to restore single files from that type of backups? Or are you interested in incremental backups that _also_ allow you to reduce the daily backup size but still gives you the ability to restore single files? Both: I'd like to be able to restore single files from both a full and incremental backup of a ZFS file system. OK, then the only filesystem independent program I know that would be able to do what you like is star. - The solution from David Korn site does differential backups and thus is unable to easily restore single files. - GNU tar fails with incremental restores if there was some specific kind of directory rename between two incrementals. - Other programs do not support incrementals. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
On Tue, Feb 26, 2008 at 2:07 PM, Nicolas Williams [EMAIL PROTECTED] wrote: How do you use CDP backups? How do you decide at which write(2) (or dirty page write, or fsync(2), ...) to restore some file? What if the app has many files? Point-in-time? Sure, but since you can't restore all application state (unless you're checkpointing processes too) then how can you be sure that the data to be restored is internally consistent? And if you'll checkpoint processes, then why not just use VMs and checkpoint those and their filesystems instead? The last option sounds much, much simpler to manage: there's only VM name and timestamp to think about when restoring. A continuous VM checkpoint facility sounds... unlikely/expensive though. Sorry, I don't understand any of this. But I never pretended I did. My post was on something else: In principle we have three types of write; atomic view, please: 1. Create. The new file needs to be written only, no backup/CDP needed; identical to any conventional system. 2. Edit/Modify. Here we need to store some incremental/differential file content. rsync-like, that is. 3. Remove. Also this is similar to the conventional system, except that the files need to be retired and the blocks *not* be marked as 'available'. Changes combined with a 'write'/'Save' instruction are not very frequently seen on personal/home machines. (Let's leave out web cache and /temp.) But even on the servers that I am running, the gigabytes of user data do not change very much; seen as percentage of overall data. Most of the 200.000 files that the users have remain unmodified for ages. Office files do change, but also not much faster than the users can type ;) . Web content changes rarely, style sheets and icons remain unmodified close to forever. The largest changes come with system/software upgrades. (One might even discuss to exclude these from CDP, and rather automate a snapshot before; in case of a problem thereafter. But that is not my topic here and now.) Also, the granularity of the 'backups' does not really have to be 100%. If - for reasons I can not imagine - a certain file would be marked for 'save' thrice in a single second, of course you don't need all the states. You do have the state at the start of that one second (to which you can roll), as well as the state at the end of that second (to which you can roll just as well; and you can even roll back and forward). I can hardly imagine a datafile to which one would want to roll, which was invalid at the start of that second, is invalid in the end, but was valid for some milliseconds in between. (How could one know about this intermediate correctness, would have to be asked.) Outside of databases, a valid state once per 10 seconds is probably even overdone. Don't forget: even if you deleted the file, it will still be there. If you 'save' a file, make a change, 'save' again, make a mistake and 'save' again, notice you made a mistake ... and all this within 10 seconds! ... you will still have the state at the begin of the 10 seconds, as well as the state at the end of those 10 seconds. 10 seconds are a hell of a lot of time to calculate and store an incremental difference. Of a single file. Whereas in a TimeMachine, 10 seconds can be a hell of a short time. Plus the huge overhead there, because you need to poll regularly, eventually on a much too high level, which files have been changed. Actually, chances are none at all has changed (at least in the /home/ of the user, even in the /home of the user*s*). Once it is event driven, 'no change' means no activity at all. Once it is event-driven, and you have 3 changes in 10 seconds, I am pretty sure that all states can be handled without much trouble. Uwe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
On Wed, Feb 27, 2008 at 01:45:41AM +0800, Uwe Dippel wrote: Sorry, I don't understand any of this. But I never pretended I did. Well, if you want some feature then you should understand what it is. Sure continuous data protection sounds real good, but you have to understand that any CDP solution has to have knowledge of, or even be driven by your applications -- otherwise CDP isn't really. This is explained below. My post was on something else: In principle we have three types of write; atomic view, please: atomic view? 1. Create. The new file needs to be written only, no backup/CDP needed; identical to any conventional system. 2. Edit/Modify. Here we need to store some incremental/differential file content. rsync-like, that is. The rub is this: how do you know when a file edit/modify has completed? The answer is: it depends on what application we're talking about! 3. Remove. Also this is similar to the conventional system, except that the files need to be retired and the blocks *not* be marked as 'available'. If an application has many files then an edit/modify may include updates and/or removals of more than one file. So once again: how do you know when an edit/modify has completed? The answer is still the same. My point is this: because the interesting times at which to take checkpoints are application-specific, we can't have a useful application-independent CDP solution. An application-independent CDP solution would not necessarily (not likely!) produce checkpoints that are safe to restore to. If you don't know whether it's safe to restore to a given checkpoint, and finding out is hard, then what use is that checkpoint? And if you know it isn't safe then the checkpoint is truly useless -- it'll just sit there, taking up space. CDP really must be an application feature. Using ZFS snapshots could certainly make it easier to implement app-level CDP, and having the ability to snapshot/clone at a finer granularity than datasets (e.g., per-file) would help too. But ZFS _alone_ cannot provide a useful CDP solution. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
Can someone please point me to link, or just unambiguously say 'yes' or 'no' to my question, if ZFS could produce a snapshot of whatever type, initiated with a signal that in turn is derived from a change (edit) of a file; like inotify in Linux 2.6.13 and above. Hi Uwe, I wasn't previously familiar with inotify, so I may be off here... But as I understand it, inotify generates asynchronous events, which something else consumes (e.g. A backup tool). I believe the asynchronous nature of inotify prevent it from enabling true CDP. i.e. It would enable very frequent backups, but there still may be rewrites occurring before the first async event is delivered processed. But based on your later comments... I think you're just looking for very frequent backups, but not necessarily capturing every unique file version? You might want to look at the information we've starting posting about ADM (an HSM). There are two general use cases for ADM: a backup solution, and a disk extender. ADM will be using a subset of DMAPI to monitor file system activity. After skimming some brief info on inotify, I believe DMAPI is similar to inotify. ADM will be using this receive file modification events (among other event types), which based on policy will trigger archive requests to tape and/or disk archives. Note that ADM will be only archiving whole files (not incremental just the incremental changes). Additionally, since its an HSM, archived files may (based on policy, etc) be released from the file system. This is the disk extender part. Think of it as an under the covers truncate that frees the disk space. When the file data is accessed in the future, events trigger ADM to stage the file back in from the archives. Users would notice a delay (as it is staged in), but would not have to take explicit action to get the file data resident again. Releasing files will of course be optional. ADM could provide frequent backups, if configured to make archives soon after file modifications. Since we archive the whole file this would not be not appropriate for large files with frequent small changes. Also, frequent backups would only be appropriate for disk archiving (due to tape load times and tape wear). Keep in mind that CDP is not the design center here. If configured to approach CDP behavior on rapidly changing filesystem, one can imagine it hammering a filesystem and still not keeping up. Also, ADM archives are very different from ZFS snapshots. We have not yet defined how a user would explicitly access a specific archive. The expectation is, we'll provide a way to see all the versions we have for a file, and the user can tell us to either restore it over the current contents of the file, or restore to a new file. http://opensolaris.org/os/project/adm/WhatisADM/ -Joe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
Can someone please point me to link, or just unambiguously say 'yes' or 'no' to my question, if ZFS could produce a snapshot of whatever type, initiated with a signal that in turn is derived from a change (edit) of a file; like inotify in Linux 2.6.13 and above. Hi Uwe, As I understand it, inotify generates asynchronous events, which something else comsumes (e.g. A backup tool). I believe the asynchronous nature of inotify prevent it from enabling “true” CDP. i.e. It would enable very frequent backups, but there still may be rewrites occurring before the first async event is delivered processed. But based on your later comments... I think you're just looking for frequent backups, not necessarily capturing every unique file version. You might want to look at the information we've starting posting about ADM (an HSM). There are two general use cases for ADM: a backup solution, and a disk extender. ADM will be using a subset of DMAPI to monitor file system activity. After skimming some brief info on inotify, I believe it is similar to DMAPI. ADM will be using DMAPI receive file modification events (among other event types), which based on policy will trigger archive requests to tape and/or disk archives. ADM will be only archiving whole files (not incremental just the incremental changes). ADM could provide frequent backups, if configured to make archives soon after file modifications. Since we archive the whole file this would not be not appropriate for large files with frequent small changes. Also, frequent backups would be more appropriate for disk archiving (due to tape load times and tape wear). Additionally, since its an HSM, archived files may (based on policy, etc) be released from the file system. This is the “disk extender” part. Think of it as an “under the covers truncate” that frees the disk space. When the file data is accessed in the future, events trigger ADM to stage the file back in from the archives. Users would notice a delay (as it is staged in), but would not have to take explicit action to get the file data resident again. Keep in mind that CDP is not the design center here. If configured to approach CDP behavior on rapidly changing filesystem, one can imagine ADM hammering a filesystem and still not keeping up. Also, ADM archives are very different from ZFS snapshots. We have not yet defined how a user would explicitly access a specific archive. The expectation is, we'll provide a way to see all the versions we have for a file, and the user can tell us to either restore it over the current contents of the file, or restore to a new file. http://opensolaris.org/os/project/adm/WhatisADM/ (my apologies if this shows up multiple times – I tried replying to the email alias and it just said “An HTML attachment was scrubbed”) -Joe This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] path-name encodings
Are path-names text or raw data in zfs? I.e., is it possible to know what the name of a file/dir/whatever is, or do I have to make more or less wild guesses what encoding is used where? - Marcus ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
Are you indicating that the filesystem know's or should know what an application is doing?? It seems to me that to achieve what you are suggesting, that's exactly what it would take. Or, you are assuming that there are no co-dependent files in applications that are out there... Whichever the case, I'm confused...! Unless you are perhaps suggesting perhaps an IOCTL that an application could call to indicate I'm done for this round, please snapshot or something to that effect. Even then, I'm still confused as to how I would do anything much useful with this over and above, say, 1 minute snapshots. Nathan. Uwe Dippel wrote: atomic view? Your post was on the gory details on how ZFS writes. Atomic View here is, that 'save' of a file is an 'atomic' operation: at one moment in time you click 'save', and some other moment in time it is done. It means indivisible, and from the perspective of the user this is how it ought to look. The rub is this: how do you know when a file edit/modify has completed? Not to me, I'm sorry, this is task of the engineer, the implementer. (See 'atomic', as above.) It would be a shame if a file system never knew if the operation was completed. If an application has many files then an edit/modify may include updates and/or removals of more than one file. So once again: how do you know when an edit/modify has completed? So an 'edit' fires off a few child processes to do this and that and then you forget about them, hoping for them to do a proper job. Oh, this gives me confidence ;) No, seriously, let's look at some applications: A. User works in Office (Star-Office, sure!) and clicks 'Save' for a current work before making major modifications. So the last state of the document (odt) is being stored. Currently we can set some Backup option to be done regularly. Meaning that the backup could have happened at the very wrong moment; while saving the state on each user request 'Save' is much better. B. A bunch of e-mails are read from the Inbox and stored locally (think Maildir). The user sees the sender, doesn't know her, and deletes all of them. Of course, the deletion process will have fired up the CDP-engine ('event') and retire the files instead of deletion. So when the sender calls, and the user learns that he made a big mistake, he can roll back to before the deletion (event). C. (Sticking with /home/) I agree with you, that the rather continuous changes of the dot-files and dot-directories in the users HOME that serve JDS, and many more, do eventually not necessarily allow to reconstitute a valid state of the settings at all and any moment. Still, chances are high, that they will. In the worst case, the unlucky user can roll back to when he last took a break, if only for grabbing another coffee, because it took a minute, the writes (see above) will hopefully have completed. oh, s***, already messed up the settings? Then try to roll back to lunch break. Works? Okay! But when you roll back to lunch break, where is the stuff done in between? The backup solution means that they are lost. The event-driven (CDP) not: you can roll over all the states of files or directories between lunch break and recover the third latest version of your tendering document (see above), within the settings of the desktop that were valid this morning. Maybe SUN can't do this, but wait for Apple, and OSX10-dot-something (using ZFS as default!) will know how to do it. (And they probably also know, when their 'writes' are done.) Uwe This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
[i]I think you're just looking for frequent backups, not necessarily capturing every unique file version.[/i] Thanks for your reply, Joe, but this is not my intention. I agree, that my arguments here look like moving targets. They simply developed along the lines of discussion. I'd still target every unique file version. Of course, not the transient ones, only those versions that have been written completely to disk. We will for a looong time not be able to reconstitute each and any moment in time. Though I am pretty sure, we can achieve a reconstitution of each and any moment of a completed write operation. If Nico was correct, the whole of ZFS wouldn't make sense. If Nico was correct, even with 'the other operating system' data would frequently be lost. Just think of a crash, a power outage without UPS: We don't know the states of the files, but in 99.9% of the cases, the states of the files on the hard drive allow for a proper reboot. Meaning, AFAICS, that the state of files on a hard drive is usually consistent. Even with VFAT, or UFS. When I do very frequent backups (once per minute, e.g.), I get a lot of overhead, metadata, system activity; on almost all unmodified files. And still, I might miss out a relevant change. I was arguing in the other post that once I do very very frequent backups (once per second, e.g.) I will be fine, because I have the state before and after that second. Even *true* CDP would probably not require that intermediate state (again, aside from some specific applications, like databases; but that is solved within the applications), which also might not have been completely written to the drive. This is - I understand - where Nico is in agreement with me. Any completed write needs to be CDP-ed. And here we reach square one: while all those inotify-s and file_events_notification are needed for TimeMachine, my fear is still that they work on too high a level, need too many resources. As I wrote I have no clue about the internals of ZFS, but was hoping the file system itself could do all the necessary. If configured to approach CDP behavior on rapidly changing filesystem, one can imagine ADM hammering a filesystem and still not keeping up. Again, too frequent polling is wasting resources. As long as we have the notion of time-induced backups, we're lost in any case. But even polling a flag and getting into action is wastage. Again, probably the file system itself needs to know how and perform the right action on its own. Uwe This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
Nathan Kroenert [EMAIL PROTECTED] wrote: Are you indicating that the filesystem know's or should know what an application is doing?? Maybe snapshot file whenever a write-filedescriptor is closed or somesuch? - Marcus ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Sun, 17 Feb 2008, Mertol Ozyoney wrote: Hi Bob; When you have some spare time can you prepare a simple benchmark report in PDF that I can share with my customers to demonstrate the performance of 2540 ? While I do not claim that it is simple I have created a report on my configuration and experience. It should be useful for users of the Sun StorageTek 2540, ZFS, and Solaris 10 multipathing. See http://www.simplesystems.org/users/bfriesen/zfs-discuss/2540-zfs-performance.pdf or http://tinyurl.com/2djewn for the URL challenged. Feel free this share this document with anyone who is interested. Thanks Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
Uwe Dippel [EMAIL PROTECTED] writes: Any completed write needs to be CDP-ed. And that is the rub, precisely. There is nothing in the app - kernel interface currently that indicates that a write has completed to a state that is meaningful to the application. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
Uwe Dippel wrote: atomic view? Your post was on the gory details on how ZFS writes. Atomic View here is, that 'save' of a file is an 'atomic' operation: at one moment in time you click 'save', and some other moment in time it is done. It means indivisible, and from the perspective of the user this is how it ought to look. The rub is this: how do you know when a file edit/modify has completed? Not to me, I'm sorry, this is task of the engineer, the implementer. (See 'atomic', as above.) It would be a shame if a file system never knew if the operation was completed. This is the consistency problem. It isn't enough to know a write() completed, you must also know that a group of write()s leaves the file in a state which is consistent for the application. If an application has many files then an edit/modify may include updates and/or removals of more than one file. So once again: how do you know when an edit/modify has completed? So an 'edit' fires off a few child processes to do this and that and then you forget about them, hoping for them to do a proper job. Oh, this gives me confidence ;) No, seriously, let's look at some applications: A. User works in Office (Star-Office, sure!) and clicks 'Save' for a current work before making major modifications. So the last state of the document (odt) is being stored. Currently we can set some Backup option to be done regularly. Meaning that the backup could have happened at the very wrong moment; while saving the state on each user request 'Save' is much better. StarOffice can record changes. So you should never lose a change, no? Other editors and office suites have similar features. Some editors even keep backup copies of modified documents. B. A bunch of e-mails are read from the Inbox and stored locally (think Maildir). The user sees the sender, doesn't know her, and deletes all of them. Of course, the deletion process will have fired up the CDP-engine ('event') and retire the files instead of deletion. So when the sender calls, and the user learns that he made a big mistake, he can roll back to before the deletion (event). SOX compliance? ;-) C. (Sticking with /home/) I agree with you, that the rather continuous changes of the dot-files and dot-directories in the users HOME that serve JDS, and many more, do eventually not necessarily allow to reconstitute a valid state of the settings at all and any moment. Still, chances are high, that they will. In the worst case, the unlucky user can roll back to when he last took a break, if only for grabbing another coffee, because it took a minute, the writes (see above) will hopefully have completed. oh, s***, already messed up the settings? Then try to roll back to lunch break. Works? Okay! But when you roll back to lunch break, where is the stuff done in between? The backup solution means that they are lost. The event-driven (CDP) not: you can roll over all the states of files or directories between lunch break and recover the third latest version of your tendering document (see above), within the settings of the desktop that were valid this morning. Actually, there is case where you wouldn't want this enabled for $HOME, in general. I use a browser every day. Actually I use several browsers every day. Each browser has a cache located somewhere in my home directory and the cache is managed so that it won't grow very large. With CDP, I would fill my disk in a week or less, just by caching everything on the internet that I pass by. Similarly, I have an e-mail account that is pop-based and tends to collect large amounts of spam, which due to some irritating circumstances, I can't remotely filter. I *really* don't want to fill up my disk with enlargement spam. The only thing that would get larger is my disk space requirement :-) Maybe SUN can't do this, but wait for Apple, and OSX10-dot-something (using ZFS as default!) will know how to do it. (And they probably also know, when their 'writes' are done.) I use firefox and thunderbird on my mac... so I guess I would fill up my disk with the internet and spam ;-/ -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
On Tue, Feb 26, 2008 at 06:34:04PM -0800, Uwe Dippel wrote: The rub is this: how do you know when a file edit/modify has completed? Not to me, I'm sorry, this is task of the engineer, the implementer. (See 'atomic', as above.) It would be a shame if a file system never knew if the operation was completed. The filesystem knows if a filesystem operation completed. It can't know application state. You keep missing that. If an application has many files then an edit/modify may include updates and/or removals of more than one file. So once again: how do you know when an edit/modify has completed? So an 'edit' fires off a few child processes to do this and that and then you forget about them, hoping for them to do a proper job. Oh, this gives me confidence ;) You'd rather the filesystem guess application state than have the app tell it? Weird. Your other alternative -- saving a history of every write -- doesn't work because you can't tell what point in time is safe to restore to. No, seriously, let's look at some applications: A. User works in Office (Star-Office, sure!) and clicks 'Save' for a current work before making major modifications. So the last state of the document (odt) is being stored. Currently we can set some Backup option to be done regularly. Meaning that the backup could have happened at the very wrong moment; while saving the state on each user request 'Save' is much better. So modify the office suite to call a new syscall that says I'm internally consistent in all these files and boom, the filesystem can now take a useful snapshot. B. A bunch of e-mails are read from the Inbox and stored locally (think Maildir). The user sees the sender, doesn't know her, and deletes all of them. Of course, the deletion process will have fired up the CDP-engine ('event') and retire the files instead of deletion. So when the sender calls, and the user learns that he made a big mistake, he can roll back to before the deletion (event). Now think of an application like this but which uses, say, SQLite (e.g., Firefox 3.x, Thunderbird, ...). The app might never close the database file, just fsync() once in a while. The DB might have multiple files (in the SQLite case that might be multiple DBs ATTACHed into one database connection). Also, an fsync of a SQLite journal file is not as useful to CDP as an fsync() of a SQLite DB proper. Now add any of a large number of databases and apps to the mix and forget it -- the heuristics become impossible or mostly useless. C. (Sticking with /home/) I agree with you, that the rather continuous changes of the dot-files and dot-directories in the users HOME that serve JDS, and many more, do eventually not necessarily allow to reconstitute a valid state of the settings at all and any moment. Still, chances are high, that they will. In the worst case, the Chances? So what, we tell the user try restoring to this snapshot, login again and if stuff doesn't work, then try another snapshot? What if the user discovers too late that the selected snapshot was inconsistent and by then they've made other changes? unlucky user can roll back to when he last took a break, if only for grabbing another coffee, because it took a minute, the writes (see That sounds mighty painful. I'd rather modify some high-profile apps to tell the filesystem that their state is consistent, so take a snapshot. Maybe SUN can't do this, but wait for Apple, and OSX10-dot-something (using ZFS as default!) will know how to do it. (And they probably also know, when their 'writes' are done.) I'm giving you the best answer -- modify the apps -- and you reject it. Given how many important apps Apple controls it wouldn't surprise me if they did what I suggest. We should do it too. But one step at a time. We need to setup a project, gather requirements, design a solution, ... And since the solution will almost certainly entail modifications to apps where heuristics won't help, well, I think this would be a project with fairly wide scope, which means it likely won't go fast. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
On Wed, Feb 27, 2008 at 05:54:29AM +0200, Marcus Sundman wrote: Nathan Kroenert [EMAIL PROTECTED] wrote: Are you indicating that the filesystem know's or should know what an application is doing?? Maybe snapshot file whenever a write-filedescriptor is closed or somesuch? Again. Not enough. Some apps (many!) deal with multiple files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
It occurred to me that we are likely missing the point here because Uwe is thinking of this as a One User on a System sort of perspective, whereas most of the rest of us are thinking of it from a 'Solaris' perspective, where we are typically expecting the system to be running many applications / DB's / users all at the same time. In Uwe's use cases thus far, it seems that he is interested in only the simple single user style applications, if I'm not mistaken, so he's not considering the consequences of what it *really* means to have CDP in the way he wishes. Uwe - am I close here? Nathan. Nicolas Williams wrote: On Tue, Feb 26, 2008 at 06:34:04PM -0800, Uwe Dippel wrote: The rub is this: how do you know when a file edit/modify has completed? Not to me, I'm sorry, this is task of the engineer, the implementer. (See 'atomic', as above.) It would be a shame if a file system never knew if the operation was completed. The filesystem knows if a filesystem operation completed. It can't know application state. You keep missing that. If an application has many files then an edit/modify may include updates and/or removals of more than one file. So once again: how do you know when an edit/modify has completed? So an 'edit' fires off a few child processes to do this and that and then you forget about them, hoping for them to do a proper job. Oh, this gives me confidence ;) You'd rather the filesystem guess application state than have the app tell it? Weird. Your other alternative -- saving a history of every write -- doesn't work because you can't tell what point in time is safe to restore to. No, seriously, let's look at some applications: A. User works in Office (Star-Office, sure!) and clicks 'Save' for a current work before making major modifications. So the last state of the document (odt) is being stored. Currently we can set some Backup option to be done regularly. Meaning that the backup could have happened at the very wrong moment; while saving the state on each user request 'Save' is much better. So modify the office suite to call a new syscall that says I'm internally consistent in all these files and boom, the filesystem can now take a useful snapshot. B. A bunch of e-mails are read from the Inbox and stored locally (think Maildir). The user sees the sender, doesn't know her, and deletes all of them. Of course, the deletion process will have fired up the CDP-engine ('event') and retire the files instead of deletion. So when the sender calls, and the user learns that he made a big mistake, he can roll back to before the deletion (event). Now think of an application like this but which uses, say, SQLite (e.g., Firefox 3.x, Thunderbird, ...). The app might never close the database file, just fsync() once in a while. The DB might have multiple files (in the SQLite case that might be multiple DBs ATTACHed into one database connection). Also, an fsync of a SQLite journal file is not as useful to CDP as an fsync() of a SQLite DB proper. Now add any of a large number of databases and apps to the mix and forget it -- the heuristics become impossible or mostly useless. C. (Sticking with /home/) I agree with you, that the rather continuous changes of the dot-files and dot-directories in the users HOME that serve JDS, and many more, do eventually not necessarily allow to reconstitute a valid state of the settings at all and any moment. Still, chances are high, that they will. In the worst case, the Chances? So what, we tell the user try restoring to this snapshot, login again and if stuff doesn't work, then try another snapshot? What if the user discovers too late that the selected snapshot was inconsistent and by then they've made other changes? unlucky user can roll back to when he last took a break, if only for grabbing another coffee, because it took a minute, the writes (see That sounds mighty painful. I'd rather modify some high-profile apps to tell the filesystem that their state is consistent, so take a snapshot. Maybe SUN can't do this, but wait for Apple, and OSX10-dot-something (using ZFS as default!) will know how to do it. (And they probably also know, when their 'writes' are done.) I'm giving you the best answer -- modify the apps -- and you reject it. Given how many important apps Apple controls it wouldn't surprise me if they did what I suggest. We should do it too. But one step at a time. We need to setup a project, gather requirements, design a solution, ... And since the solution will almost certainly entail modifications to apps where heuristics won't help, well, I think this would be a project with fairly wide scope, which means it likely won't go fast. Nico ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Wed, Feb 27, 2008 at 6:17 AM, Bob Friesenhahn [EMAIL PROTECTED] wrote: On Sun, 17 Feb 2008, Mertol Ozyoney wrote: Hi Bob; When you have some spare time can you prepare a simple benchmark report in PDF that I can share with my customers to demonstrate the performance of 2540 ? While I do not claim that it is simple I have created a report on my configuration and experience. It should be useful for users of the Sun StorageTek 2540, ZFS, and Solaris 10 multipathing. See http://www.simplesystems.org/users/bfriesen/zfs-discuss/2540-zfs-performance.pdf Nov 26, 2008 ??? May I borrow your time machine ? ;-) -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss