Re: [zfs-discuss] Cause for data corruption?

2008-02-26 Thread Nicolas Szalay
Le lundi 25 février 2008 à 11:05 -0800, Sandro a écrit :
 hi folks

Hi,

 I've been running my fileserver at home with linux for a couple of years and 
 last week I finally reinstalled it with solaris 10 u4.
 
 I borrowed a bunch of disks from a friend, copied over all the files, 
 reinstalled my fileserver and copied the data back.
 
 Everything went fine, but after a few days now, quite a lot of files got 
 corrupted.
 here's the output:
 
  # zpool status data
   pool: data
  state: ONLINE
 status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: scrub completed with 422 errors on Mon Feb 25 00:32:18 2008
 config:
 
 NAMESTATE READ WRITE CKSUM
 dataONLINE   0 0 5.52K
   raidz1ONLINE   0 0 5.52K
 c0t0d0  ONLINE   0 0 10.72
 c0t1d0  ONLINE   0 0 4.59K
 c0t2d0  ONLINE   0 0 5.18K
 c0t3d0  ONLINE   0 0 9.10K
 c1t0d0  ONLINE   0 0 7.64K
 c1t1d0  ONLINE   0 0 3.75K
 c1t2d0  ONLINE   0 0 4.39K
 c1t3d0  ONLINE   0 0 6.04K
 
 errors: 388 data errors, use '-v' for a list
 
 Last night I found out about this, it told me there were errors in like 50 
 files.
 So I scrubbed the whole pool and it found a lot more corrupted files.
 
 The temporary system which I used to hold the data while I'm installing 
 solaris on my fileserver is running nv build 80 and no errors on there.
 
 What could be the cause of these errors??
 I don't see any hw errors on my disks..
 
  # iostat -En | grep -i error
 c3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c0t0d0   Soft Errors: 574 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c1t0d0   Soft Errors: 549 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c0t1d0   Soft Errors: 14 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c0t2d0   Soft Errors: 549 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c0t3d0   Soft Errors: 549 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c1t1d0   Soft Errors: 548 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c1t2d0   Soft Errors: 14 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 c1t3d0   Soft Errors: 548 Hard Errors: 0 Transport Errors: 0
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 
 although a lot of soft errors.
 Linux said that one disk had gone bad, but I figured the sata cable was 
 somehow broken, so I replaced that before installing solaris. And solaris 
 didn't and doesn't see any actual hw errors on the disks, does it?

I had the same symptoms recently. I also thought the disk were dying but
I was wrong. Suspected the RAM, no. Finally it was because I mixed raid
cards on different PCI buses : 2 64bits buses (no problem with these
ones) and 1 32 Bits PCI bus which caused *all* the checksum errors.

Kicked ou the card on the 32 bit PCI bus and all worked fine.

Hope it helps,

-- 
Nicolas Szalay

Administrateur systèmes  réseaux

-- _
ASCII ribbon campaign ( )
 - against HTML email  X
  vCards / \


signature.asc
Description: Ceci est une partie de message	numériquement signée
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Preferred backup s/w

2008-02-26 Thread Joerg Schilling
Rich Teer [EMAIL PROTECTED] wrote:

  People who like to backup usually also like to do incremental backups.
  Why don't you?

 I do like incremental backups.  But the ability to do incremental backups
 and restore arbitrary files from an archive are two different things.  An
 incremental backup backs up files that have changed since the most recent
 backup, so suppose my home directory contains 1000 files, 100 of which have
 changed since my last backup.  I perform an incremental backup of my home
 directory, and the resulting archive contains those 100 files.

 Now suppose that I accidentally delete a couple of those files; it is very
 desirable to be able to restore just a certain named subset of the files
 in an archive rather than having to restore the whole archive.  I'm looking
 for a tool that can do that.

Hi Rich, I asked you a question that you did not yet answer:

Are you interested only in full backups and in the ability to restore single 
files from that type of backups?

Or are you interested in incremental backups that _also_ allow you to reduce the
daily backup size but still gives you the ability to restore single files?


I am asking this because there are some backup programs that do not fit into the
list above: The Amanda people e.g. call something incremental backup 
that does not allow you to restore to an empty disk up to the state of the last 
incremental. Amanda in this case suffers from the problem that GNU tar does not 
allow you to do a restore on an empty disk if someone did rename directories in 
a way that triggers the conceptional problems in GNU tar.

So it seems to be important to me to first find what kind of backup you are 
interested in.

Please answer my questions!

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Preferred backup s/w

2008-02-26 Thread Joerg Schilling
michael schuster [EMAIL PROTECTED] wrote:

 Rich never said so. He said the ability to do incremental backups and 
 restore arbitrary files from an archive are two different things. You were 
 addressing an issue he never brought up.

I really don't understand why you did not answer my question. 
It is obvious that there is some confusion in the question and it is not 
possible to continue the discussion if you do not try to help to solve this 
problem.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Preferred backup s/w

2008-02-26 Thread Joerg Schilling
Darren J Moffat [EMAIL PROTECTED] wrote:

 ZFS discuss is fine but the thread has gone into non ZFS related and is 
 generic backup stuff.  If there are ZFS specifics - like the question 
 about extended attributes then I think this is a reasonable place to 
 discuss.  Discussion about nomenclature of Amanda when it does not 
 concern ZFS is not appropriate for here.

You are welcome to create a mailing list for generic backup stuff

The discussion here seems to be started by people who are looking for
a backup suitable for ZFS.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The old problem with tar, zfs, nfs and zil

2008-02-26 Thread msl
 For Linux NFS service, it's a option in
 /etc/exports.
 
 The default for modern (post-1.0.1) NFS utilities
 is sync, which means that data and metadata will be
 written to the disk whenever NFS requires it
 (generally upon an NFS COMMIT operation).  This is
 the same as Solaris with UFS, or with ZFS+ZIL. This
 works with XFS, EXT3, and any other file system with
 a working fsync().
 Ok, i did know that, i have forgot to mention in my question that my doubt was 
if Linux would really honour the sync. Do you understand? I did read that 
Linux does not (even with sync in exports). In nfsv2 for example, does not 
matter if you put sync or async, the server will ACK as soon as it receives the 
request (NOP). But if you are telling that *now* Linux is really syncing discs 
before ACK the client, well... so there is a huge diff on zfs/nfs and xfs/nfs, 
because the numbers that i have posted is with sync on Linux.
 
 It's possible to switch this off on Linux, but not
 recommended, as there is a chance that data could be
 lost if the server crashed. (For the same reason, the
 ZIL should not be disabled on a Solaris NFS server.)
 I understand that, so i did not even try to disable ZIL until now. All the 
tests that i have made was respecting a semantically correct NFS service. If 
the ZIL could be configured per filesystem, or pool...
 The diff is 7.5s to 1.0s, and theoretically zfs is more efficient than xfs.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] modification to zdb to decompress blocks

2008-02-26 Thread [EMAIL PROTECTED]
Hi All,
I have modified zdb to do decompression in zdb_read_block.  Syntax is:

# zdb -R poolname:devid:blkno:psize:d,compression_type,lsize

Where compression_type can be lzjb or any other type compression that 
zdb uses, and
lsize is the size after compression.  I have used this with a modified 
mdb to allow one to
do the following:

given a pathname for a file on a zfs file system, display the blocks 
(i.e., data) of the file.  The file
system need not be mounted.

If anyone is interested, send me email.  I can send a webrev of the zdb 
changes for those interested.
As for the mdb changes, I sent a webrev of those a while ago, and have 
since added a rawzfs dmod.

I plan to present a paper at osdevcon in Prague in June that uses the 
modified zdb and mdb to
show the physical layout of a zfs file system.  (I should mention that, 
over time, I have found that
the ZFS on-disk format paper actually does tell you almost everything).

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The old problem with tar, zfs, nfs and zil

2008-02-26 Thread Roch Bourbonnais

I would imagine that linux to behave more like ZFS that does not flush  
caches.
(google Evil zfs_nocacheflush).

If you can nfs tar extract files on linux faster than one file per  
rotation latency;
that is suspicious.

-r

Le 26 févr. 08 à 13:16, msl a écrit :

 For Linux NFS service, it's a option in
 /etc/exports.

 The default for modern (post-1.0.1) NFS utilities
 is sync, which means that data and metadata will be
 written to the disk whenever NFS requires it
 (generally upon an NFS COMMIT operation).  This is
 the same as Solaris with UFS, or with ZFS+ZIL. This
 works with XFS, EXT3, and any other file system with
 a working fsync().
 Ok, i did know that, i have forgot to mention in my question that my  
 doubt was if Linux would really honour the sync. Do you  
 understand? I did read that Linux does not (even with sync in  
 exports). In nfsv2 for example, does not matter if you put sync or  
 async, the server will ACK as soon as it receives the request (NOP).  
 But if you are telling that *now* Linux is really syncing discs  
 before ACK the client, well... so there is a huge diff on zfs/nfs  
 and xfs/nfs, because the numbers that i have posted is with sync  
 on Linux.

 It's possible to switch this off on Linux, but not
 recommended, as there is a chance that data could be
 lost if the server crashed. (For the same reason, the
 ZIL should not be disabled on a Solaris NFS server.)
 I understand that, so i did not even try to disable ZIL until now.  
 All the tests that i have made was respecting a semantically correct  
 NFS service. If the ZIL could be configured per filesystem, or pool...
 The diff is 7.5s to 1.0s, and theoretically zfs is more efficient  
 than xfs.


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cause for data corruption?

2008-02-26 Thread Sandro
Hey

Thanks for your answers guys.

I'll run VTS to stresstest cpu and memory.

And I just checked the block diagram of my motherboard (Gigabyte M61P-S3).
It doesn't even have 64bit pci slots.. just standard old 33mhz 32bit pci .. and 
a couple of newer pci-e.
But my two controllers are both the same vendor / version and are both 
connected to the same pci bus.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The old problem with tar, zfs, nfs and zil

2008-02-26 Thread msl
Actually, i have some corrections to be made. When i did see the numbers, i was 
stunned and that blocked me to think…
Here you can see the right numbers: http://www.posix.brte.com.br/blog/?p=104
The problem was the discs were i have made the tests.
 Thanks for your time.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Preferred backup s/w

2008-02-26 Thread Rich Teer
On Tue, 26 Feb 2008, Joerg Schilling wrote:

 Hi Rich, I asked you a question that you did not yet answer:

Hi Jörg,

 Are you interested only in full backups and in the ability to restore single 
 files from that type of backups?
 
 Or are you interested in incremental backups that _also_ allow you to reduce 
 the
 daily backup size but still gives you the ability to restore single files?

Both: I'd like to be able to restore single files from both a full and
incremental backup of a ZFS file system.

-- 
Rich Teer, SCSA, SCNA, SCSECA, OGB member

CEO,
My Online Home Inventory

URLs: http://www.rite-group.com/rich
  http://www.linkedin.com/in/richteer
  http://www.myonlinehomeinventory.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Preferred backup s/w

2008-02-26 Thread Joerg Schilling
Rich Teer [EMAIL PROTECTED] wrote:

  Are you interested only in full backups and in the ability to restore 
  single 
  files from that type of backups?
  
  Or are you interested in incremental backups that _also_ allow you to 
  reduce the
  daily backup size but still gives you the ability to restore single files?

 Both: I'd like to be able to restore single files from both a full and
 incremental backup of a ZFS file system.

OK, then the only filesystem independent program I know that would be able to 
do what you like is star.

-   The solution from David Korn site does differential backups and thus
is unable to easily restore single files.

-   GNU tar fails with incremental restores if there was some specific 
kind of directory rename between two incrementals.

-   Other programs do not support incrementals.



Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Uwe Dippel
On Tue, Feb 26, 2008 at 2:07 PM, Nicolas Williams
[EMAIL PROTECTED] wrote:
 How do you use CDP backups?  How do you decide at which write(2) (or
  dirty page write, or fsync(2), ...) to restore some file?  What if the
  app has many files?  Point-in-time?  Sure, but since you can't restore
  all application state (unless you're checkpointing processes too) then
  how can you be sure that the data to be restored is internally
  consistent?  And if you'll checkpoint processes, then why not just use
  VMs and checkpoint those and their filesystems instead?  The last option
  sounds much, much simpler to manage: there's only VM name and timestamp
  to think about when restoring.  A continuous VM checkpoint facility
  sounds... unlikely/expensive though.

Sorry, I don't understand any of this. But I never pretended I did.
My post was on something else:
In principle we have three types of write; atomic view, please:
1. Create. The new file needs to be written only, no backup/CDP
needed; identical to any conventional system.
2. Edit/Modify. Here we need to store some incremental/differential
file content. rsync-like, that is.
3. Remove. Also this is similar to the conventional system, except
that the files need to be retired and the blocks *not* be marked as
'available'.

Changes combined with a 'write'/'Save' instruction are not very
frequently seen on personal/home machines. (Let's leave out web cache
and /temp.) But even on the servers that I am running, the gigabytes
of user data do not change very much; seen as percentage of overall
data. Most of the 200.000 files that the users have remain unmodified
for ages. Office files do change, but also not much faster than the
users can type ;) . Web content changes rarely, style sheets and icons
remain unmodified close to forever. The largest changes come with
system/software upgrades. (One might even discuss to exclude these
from CDP, and rather automate a snapshot before; in case of a problem
thereafter. But that is not my topic here and now.)

Also, the granularity of the 'backups' does not really have to be
100%. If - for reasons I can not imagine - a certain file would be
marked for 'save' thrice in a single second, of course you don't need
all the states. You do have the state at the start of that one second
(to which you can roll), as well as the state at the end of that
second (to which you can roll just as well; and you can even roll back
and forward). I can hardly imagine a datafile to which one would want
to roll, which was invalid at the start of that second, is invalid in
the end, but was valid for some milliseconds in between. (How could
one know about this intermediate correctness, would have to be asked.)
Outside of databases, a valid state once per 10 seconds is probably
even overdone. Don't forget: even if you deleted the file, it will
still be there. If you 'save' a file, make a change, 'save' again,
make a mistake and 'save' again, notice you made a mistake ... and all
this within 10 seconds! ... you will still have the state at the begin
of the 10 seconds, as well as the state at the end of those 10
seconds. 10 seconds are a hell of a lot of time to calculate and store
an incremental difference. Of a single file. Whereas in a TimeMachine,
10 seconds can be a hell of a short time. Plus the huge overhead
there, because you need to poll regularly, eventually on a much too
high level, which files have been changed. Actually, chances are none
at all has changed (at least in the /home/ of the user, even in the
/home of the user*s*). Once it is event driven, 'no change' means no
activity at all. Once it is event-driven, and you have 3 changes in 10
seconds, I am pretty sure that all states can be handled without much
trouble.

Uwe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Nicolas Williams
On Wed, Feb 27, 2008 at 01:45:41AM +0800, Uwe Dippel wrote:
 Sorry, I don't understand any of this. But I never pretended I did.

Well, if you want some feature then you should understand what it is.
Sure continuous data protection sounds real good, but you have to
understand that any CDP solution has to have knowledge of, or even be
driven by your applications -- otherwise CDP isn't really.  This is
explained below.

 My post was on something else:
 In principle we have three types of write; atomic view, please:

atomic view?

 1. Create. The new file needs to be written only, no backup/CDP
 needed; identical to any conventional system.
 2. Edit/Modify. Here we need to store some incremental/differential
 file content. rsync-like, that is.

The rub is this: how do you know when a file edit/modify has completed?

The answer is: it depends on what application we're talking about!

 3. Remove. Also this is similar to the conventional system, except
 that the files need to be retired and the blocks *not* be marked as
 'available'.

If an application has many files then an edit/modify may include
updates and/or removals of more than one file.  So once again: how do
you know when an edit/modify has completed?  The answer is still the
same.

My point is this: because the interesting times at which to take
checkpoints are application-specific, we can't have a useful
application-independent CDP solution.

An application-independent CDP solution would not necessarily (not
likely!) produce checkpoints that are safe to restore to.

If you don't know whether it's safe to restore to a given checkpoint,
and finding out is hard, then what use is that checkpoint?  And if you
know it isn't safe then the checkpoint is truly useless -- it'll just
sit there, taking up space.

CDP really must be an application feature.  Using ZFS snapshots could
certainly make it easier to implement app-level CDP, and having the
ability to snapshot/clone at a finer granularity than datasets (e.g.,
per-file) would help too.  But ZFS _alone_ cannot provide a useful CDP
solution.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Joe Blount




Can someone please point me to link, or just unambiguously say 'yes'
or 'no' to my question, if ZFS could produce a snapshot of whatever
type, initiated with a signal that in turn is derived from a change
(edit) of a file; like inotify in Linux 2.6.13 and above.

Hi Uwe,

I wasn't previously familiar with inotify, so I may be off here... But
as I understand it, inotify generates asynchronous events, which
something else consumes (e.g. A backup tool). I believe the
asynchronous nature of inotify prevent it from enabling true CDP.
i.e. It would enable very frequent backups, but there still may be
rewrites occurring before the first async event is delivered 
processed. 

But based on your later comments... I think you're just looking for
very frequent backups, but not necessarily capturing every unique file
version?

You might want to look at the information we've starting posting about
ADM (an HSM). There are two general use cases for ADM: a backup
solution, and a disk extender.

ADM will be using a subset of DMAPI to monitor file system activity.
After skimming some brief info on inotify, I believe DMAPI is similar
to inotify. ADM will be using this receive file modification events
(among other event types), which based on policy will trigger archive
requests to tape and/or disk archives. Note that ADM will be only
archiving whole files (not incremental just the incremental changes). 

Additionally, since its an HSM, archived files may (based on policy,
etc) be released from the file system. This is the disk extender
part. Think of it as an under the covers truncate that frees the
disk space. When the file data is accessed in the future, events
trigger ADM to stage the file back in from the archives. Users would
notice a delay (as it is staged in), but would not have to take
explicit action to get the file data resident again. Releasing files
will of course be optional.

ADM could provide frequent backups, if configured to make archives soon
after file modifications. Since we archive the whole file this would
not be not appropriate for large files with frequent small changes.
Also, frequent backups would only be appropriate for disk archiving
(due to tape load times and tape wear). 

Keep in mind that CDP is not the design center here. If configured to
approach CDP behavior on rapidly changing filesystem, one can imagine
it hammering a filesystem and still not keeping up. 

Also, ADM archives are very different from ZFS snapshots. We have not
yet defined how a user would explicitly access a specific archive. The
expectation is, we'll provide a way to see all the versions we have for
a file, and the user can tell us to either restore it over the current
contents of the file, or restore to a new file.

http://opensolaris.org/os/project/adm/WhatisADM/

-Joe



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Joe Blount
 Can someone please point me to link, or just
 unambiguously say 'yes' or 'no' to my question, if
 ZFS could produce a snapshot of whatever type,
 initiated with a signal that in turn is derived from
 a change (edit) of a file; like inotify in Linux
 2.6.13 and above.

Hi Uwe,

As I understand it, inotify generates asynchronous events, which something else 
comsumes (e.g. A backup tool).  I believe the asynchronous nature of inotify 
prevent it from enabling “true” CDP.  i.e. It would enable very frequent 
backups, but there still may be rewrites occurring before the first async event 
is delivered  processed.  But based on your later comments... I think you're 
just looking for frequent backups, not necessarily capturing every unique file 
version.

You might want to look at the information we've starting posting about ADM (an 
HSM).  There are two general use cases for ADM: a backup solution, and a disk 
extender.

ADM will be using a subset of DMAPI to monitor file system activity.  After 
skimming some brief info on inotify, I believe it is similar to DMAPI.  ADM 
will be using DMAPI receive file modification events (among other event types), 
which based on policy will trigger archive requests to tape and/or disk 
archives.  ADM will be only archiving whole files (not incremental just the 
incremental changes).  

ADM could provide frequent backups, if configured to make archives soon after 
file modifications.  Since we archive the whole file this would not be not 
appropriate for large files with frequent small changes.  Also, frequent 
backups would be more appropriate for disk archiving (due to tape load times 
and tape wear).  

Additionally, since its an HSM, archived files may (based on policy, etc) be 
released from the file system.  This is the “disk extender” part.  Think of it 
as an “under the covers truncate” that frees the disk space.  When the file 
data is accessed in the future, events trigger ADM to stage the file back in 
from the archives.  Users would notice a delay (as it is staged in), but would 
not have to take explicit action to get the file data resident again.  

Keep in mind that CDP is not the design center here.  If configured to approach 
CDP behavior on rapidly changing filesystem, one can imagine ADM hammering a 
filesystem and still not keeping up.  

Also, ADM archives are very different from ZFS snapshots.  We have not yet 
defined how a user would explicitly access a specific archive.  The expectation 
is, we'll provide a way to see all the versions we have for a file, and the 
user can tell us to either restore it over the current contents of the file, or 
restore to a new file.

http://opensolaris.org/os/project/adm/WhatisADM/

(my apologies if this shows up multiple times – I tried replying to the email 
alias and it just said “An HTML attachment was scrubbed”)

-Joe
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] path-name encodings

2008-02-26 Thread Marcus Sundman
Are path-names text or raw data in zfs? I.e., is it possible to know
what the name of a file/dir/whatever is, or do I have to make more or
less wild guesses what encoding is used where?

- Marcus
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Nathan Kroenert
Are you indicating that the filesystem know's or should know what an 
application is doing??

It seems to me that to achieve what you are suggesting, that's exactly 
what it would take.

Or, you are assuming that there are no co-dependent files in 
applications that are out there...

Whichever the case, I'm confused...!

Unless you are perhaps suggesting perhaps an IOCTL that an application 
could call to indicate I'm done for this round, please snapshot or 
something to that effect. Even then, I'm still confused as to how I 
would do anything much useful with this over and above, say, 1 minute 
snapshots.

Nathan.


Uwe Dippel wrote:
 atomic view?
 
 Your post was on the gory details on how ZFS writes. Atomic View here is, 
 that 'save' of a file is an 'atomic' operation: at one moment in time you 
 click 'save', and some other moment in time it is done. It means indivisible, 
 and from the perspective of the user this is how it ought to look.
 
 The rub is this: how do you know when a file edit/modify has completed?
 
 Not to me, I'm sorry, this is task of the engineer, the implementer. (See 
 'atomic', as above.)
 It would be a shame if a file system never knew if the operation was 
 completed.
 
 If an application has many files then an edit/modify may include
 updates and/or removals of more than one file. So once again: how do
 you know when an edit/modify has completed?
 
 So an 'edit' fires off a few child processes to do this and that and then you 
 forget about them, hoping for them to do a proper job. 
 Oh, this gives me confidence ;)
 
 No, seriously, let's look at some applications:
 
 A. User works in Office (Star-Office, sure!) and clicks 'Save' for a current 
 work before making major modifications. So the last state of the document 
 (odt) is being stored. Currently we can set some Backup option to be done 
 regularly. Meaning that the backup could have happened at the very wrong 
 moment; while saving the state on each user request 'Save' is much better.
 
 B. A bunch of e-mails are read from the Inbox and stored locally (think 
 Maildir). The user sees the sender, doesn't know her, and deletes all of 
 them. Of course, the deletion process will have fired up the CDP-engine 
 ('event') and retire the files instead of deletion. So when the sender calls, 
 and the user learns that he made a big mistake, he can roll back to before 
 the deletion (event).
 
 C. (Sticking with /home/) I agree with you, that the rather continuous 
 changes of the dot-files and dot-directories in the users HOME that serve 
 JDS, and many more, do eventually not necessarily allow to reconstitute a 
 valid state of the settings at all and any moment. Still, chances are high, 
 that they will. In the worst case, the unlucky user can roll back to when he 
 last took a break, if only for grabbing another coffee, because it took a 
 minute, the writes (see above) will hopefully have completed. oh, s***, 
 already messed up the settings? Then try to roll back to lunch break. Works? 
 Okay! But when you roll back to lunch break, where is the stuff done in 
 between? The backup solution means that they are lost. The event-driven (CDP) 
 not: you can roll over all the states of files or directories between lunch 
 break and recover the third latest version of your tendering document (see 
 above), within the settings of the desktop that were valid this morning.
 
 Maybe SUN can't do this, but wait for Apple, and OSX10-dot-something (using 
 ZFS as default!) will know how to do it. (And they probably also know, when 
 their 'writes' are done.)
 
 Uwe
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Uwe Dippel
[i]I think you're just looking for frequent backups, not necessarily capturing 
every unique file version.[/i]

Thanks for your reply, Joe, but this is not my intention. I agree, that my 
arguments here look like moving targets. They simply developed along the lines 
of discussion. I'd still target every unique file version. Of course, not the 
transient ones, only those versions that have been written completely to disk. 
We will for a looong time not be able to reconstitute each and any moment in 
time. Though I am pretty sure, we can achieve a reconstitution of each and any 
moment of a completed write operation. If Nico was correct, the whole of ZFS 
wouldn't make sense. If Nico was correct, even with 'the other operating 
system' data would frequently be lost. Just think of a crash, a power outage 
without UPS: We don't know the states of the files, but in 99.9% of the cases, 
the states of the files on the hard drive allow for a proper reboot. Meaning, 
AFAICS, that the state of files on a hard drive is usually consistent. Even 
with VFAT, or UFS. 
When I do very frequent backups (once per minute, e.g.), I get a lot of 
overhead, metadata, system activity; on almost all unmodified files. And still, 
I might miss out a relevant change. I was arguing in the other post that once I 
do very very frequent backups (once per second, e.g.) I will be fine, because I 
have the state before and after that second. Even *true* CDP would probably not 
require that intermediate state (again, aside from some specific applications, 
like databases; but that is solved within the applications), which also might 
not have been completely written to the drive. This is - I understand - where 
Nico is in agreement with me. Any completed write needs to be CDP-ed.

And here we reach square one: while all those inotify-s and 
file_events_notification are needed for TimeMachine, my fear is still that they 
work on too high a level, need too many resources. As I wrote I have no clue 
about the internals of ZFS, but was hoping the file system itself could do all 
the necessary.

 If configured to approach CDP behavior on rapidly changing filesystem, one 
 can imagine  ADM hammering a filesystem and still not keeping up.

Again, too frequent polling is wasting resources. As long as we have the notion 
of time-induced backups, we're lost in any case. But even polling a flag and 
getting into action is wastage. Again, probably the file system itself needs to 
know how and perform the right action on its own.

Uwe
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Marcus Sundman
Nathan Kroenert [EMAIL PROTECTED] wrote:
 Are you indicating that the filesystem know's or should know what an 
 application is doing??

Maybe snapshot file whenever a write-filedescriptor is closed or
somesuch?


- Marcus
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-26 Thread Bob Friesenhahn
On Sun, 17 Feb 2008, Mertol Ozyoney wrote:

 Hi Bob;

 When you have some spare time can you prepare a simple benchmark report in
 PDF that I can share with my customers to demonstrate the performance of
 2540 ?

While I do not claim that it is simple I have created a report on my 
configuration and experience.  It should be useful for users of the 
Sun StorageTek 2540, ZFS, and Solaris 10 multipathing.

See

http://www.simplesystems.org/users/bfriesen/zfs-discuss/2540-zfs-performance.pdf

or http://tinyurl.com/2djewn for the URL challenged.

Feel free this share this document with anyone who is interested.

Thanks

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Boyd Adamson
Uwe Dippel [EMAIL PROTECTED] writes:
 Any completed write needs to be CDP-ed.

And that is the rub, precisely. There is nothing in the app - kernel
interface currently that indicates that a write has completed to a state
that is meaningful to the application.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Richard Elling
Uwe Dippel wrote:
 atomic view?
 

 Your post was on the gory details on how ZFS writes. Atomic View here is, 
 that 'save' of a file is an 'atomic' operation: at one moment in time you 
 click 'save', and some other moment in time it is done. It means indivisible, 
 and from the perspective of the user this is how it ought to look.

   
 The rub is this: how do you know when a file edit/modify has completed?
 

 Not to me, I'm sorry, this is task of the engineer, the implementer. (See 
 'atomic', as above.)
 It would be a shame if a file system never knew if the operation was 
 completed.

   

This is the consistency problem. It isn't enough to know a write()
completed, you must also know that a group of write()s leaves the
file in a state which is consistent for the application.

 If an application has many files then an edit/modify may include
 updates and/or removals of more than one file. So once again: how do
 you know when an edit/modify has completed?
 

 So an 'edit' fires off a few child processes to do this and that and then you 
 forget about them, hoping for them to do a proper job. 
 Oh, this gives me confidence ;)

 No, seriously, let's look at some applications:

 A. User works in Office (Star-Office, sure!) and clicks 'Save' for a current 
 work before making major modifications. So the last state of the document 
 (odt) is being stored. Currently we can set some Backup option to be done 
 regularly. Meaning that the backup could have happened at the very wrong 
 moment; while saving the state on each user request 'Save' is much better.
   

StarOffice can record changes.  So you should never lose a change, no?
Other editors and office suites have similar features.  Some editors even
keep backup copies of modified documents.

 B. A bunch of e-mails are read from the Inbox and stored locally (think 
 Maildir). The user sees the sender, doesn't know her, and deletes all of 
 them. Of course, the deletion process will have fired up the CDP-engine 
 ('event') and retire the files instead of deletion. So when the sender calls, 
 and the user learns that he made a big mistake, he can roll back to before 
 the deletion (event).
   

SOX compliance? ;-)

 C. (Sticking with /home/) I agree with you, that the rather continuous 
 changes of the dot-files and dot-directories in the users HOME that serve 
 JDS, and many more, do eventually not necessarily allow to reconstitute a 
 valid state of the settings at all and any moment. Still, chances are high, 
 that they will. In the worst case, the unlucky user can roll back to when he 
 last took a break, if only for grabbing another coffee, because it took a 
 minute, the writes (see above) will hopefully have completed. oh, s***, 
 already messed up the settings? Then try to roll back to lunch break. Works? 
 Okay! But when you roll back to lunch break, where is the stuff done in 
 between? The backup solution means that they are lost. The event-driven (CDP) 
 not: you can roll over all the states of files or directories between lunch 
 break and recover the third latest version of your tendering document (see 
 above), within the settings of the desktop that were valid this morning.

   

Actually, there is case where you wouldn't want this
enabled for $HOME, in general.  I use a browser every
day.  Actually I use several browsers every day.  Each
browser has a cache located somewhere in my home
directory and the cache is managed so that it won't
grow very large.  With CDP, I would fill my disk in
a week or less, just by caching everything on the
internet that I pass by.

Similarly, I have an e-mail account that is pop-based
and tends to collect large amounts of spam, which due
to some irritating circumstances, I can't remotely filter.
I *really* don't want to fill up my disk with enlargement
spam.  The only thing that would get larger is my disk
space requirement :-)

 Maybe SUN can't do this, but wait for Apple, and OSX10-dot-something (using 
 ZFS as default!) will know how to do it. (And they probably also know, when 
 their 'writes' are done.)
   

I use firefox and thunderbird on my mac... so I guess I
would fill up my disk with the internet and spam ;-/
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Nicolas Williams
On Tue, Feb 26, 2008 at 06:34:04PM -0800, Uwe Dippel wrote:
  The rub is this: how do you know when a file edit/modify has completed?
 
 Not to me, I'm sorry, this is task of the engineer, the implementer.
 (See 'atomic', as above.) It would be a shame if a file system never
 knew if the operation was completed.

The filesystem knows if a filesystem operation completed.  It can't know
application state.  You keep missing that.

  If an application has many files then an edit/modify may include
  updates and/or removals of more than one file. So once again: how do
  you know when an edit/modify has completed?
 
 So an 'edit' fires off a few child processes to do this and that and
 then you forget about them, hoping for them to do a proper job.  Oh,
 this gives me confidence ;)

You'd rather the filesystem guess application state than have the app
tell it?  Weird.  Your other alternative -- saving a history of every
write -- doesn't work because you can't tell what point in time is safe
to restore to.

 No, seriously, let's look at some applications:
 
 A. User works in Office (Star-Office, sure!) and clicks 'Save' for a
 current work before making major modifications. So the last state of
 the document (odt) is being stored. Currently we can set some Backup
 option to be done regularly. Meaning that the backup could have
 happened at the very wrong moment; while saving the state on each user
 request 'Save' is much better.

So modify the office suite to call a new syscall that says I'm
internally consistent in all these files and boom, the filesystem can
now take a useful snapshot.

 B. A bunch of e-mails are read from the Inbox and stored locally
 (think Maildir). The user sees the sender, doesn't know her, and
 deletes all of them. Of course, the deletion process will have fired
 up the CDP-engine ('event') and retire the files instead of deletion.
 So when the sender calls, and the user learns that he made a big
 mistake, he can roll back to before the deletion (event).

Now think of an application like this but which uses, say, SQLite (e.g.,
Firefox 3.x, Thunderbird, ...).  The app might never close the database
file, just fsync() once in a while.  The DB might have multiple files
(in the SQLite case that might be multiple DBs ATTACHed into one
database connection).  Also, an fsync of a SQLite journal file is not
as useful to CDP as an fsync() of a SQLite DB proper.  Now add any of a
large number of databases and apps to the mix and forget it -- the
heuristics become impossible or mostly useless.

 C. (Sticking with /home/) I agree with you, that the rather continuous
 changes of the dot-files and dot-directories in the users HOME that
 serve JDS, and many more, do eventually not necessarily allow to
 reconstitute a valid state of the settings at all and any moment.
 Still, chances are high, that they will. In the worst case, the

Chances?  So what, we tell the user try restoring to this snapshot,
login again and if stuff doesn't work, then try another snapshot?  What
if the user discovers too late that the selected snapshot was
inconsistent and by then they've made other changes?

 unlucky user can roll back to when he last took a break, if only for
 grabbing another coffee, because it took a minute, the writes (see

That sounds mighty painful.

I'd rather modify some high-profile apps to tell the filesystem that
their state is consistent, so take a snapshot.

 Maybe SUN can't do this, but wait for Apple, and OSX10-dot-something
 (using ZFS as default!) will know how to do it. (And they probably
 also know, when their 'writes' are done.)

I'm giving you the best answer -- modify the apps -- and you reject it.
Given how many important apps Apple controls it wouldn't surprise me if
they did what I suggest.  We should do it too.  But one step at a time.
We need to setup a project, gather requirements, design a solution, ...
And since the solution will almost certainly entail modifications to
apps where heuristics won't help, well, I think this would be a project
with fairly wide scope, which means it likely won't go fast.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Nicolas Williams
On Wed, Feb 27, 2008 at 05:54:29AM +0200, Marcus Sundman wrote:
 Nathan Kroenert [EMAIL PROTECTED] wrote:
  Are you indicating that the filesystem know's or should know what an 
  application is doing??
 
 Maybe snapshot file whenever a write-filedescriptor is closed or
 somesuch?

Again.  Not enough.  Some apps (many!) deal with multiple files.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-26 Thread Nathan Kroenert
It occurred to me that we are likely missing the point here because Uwe 
is thinking of this as a One User on a System sort of perspective, 
whereas most of the rest of us are thinking of it from a 'Solaris' 
perspective, where we are typically expecting the system to be running 
many applications / DB's / users all at the same time.

In Uwe's use cases thus far, it seems that he is interested in only the 
simple single user style applications, if I'm not mistaken, so he's not 
considering the consequences of what it *really* means to have CDP in 
the way he wishes.

Uwe - am I close here?

Nathan.


Nicolas Williams wrote:
 On Tue, Feb 26, 2008 at 06:34:04PM -0800, Uwe Dippel wrote:
 The rub is this: how do you know when a file edit/modify has completed?
 Not to me, I'm sorry, this is task of the engineer, the implementer.
 (See 'atomic', as above.) It would be a shame if a file system never
 knew if the operation was completed.
 
 The filesystem knows if a filesystem operation completed.  It can't know
 application state.  You keep missing that.
 
 If an application has many files then an edit/modify may include
 updates and/or removals of more than one file. So once again: how do
 you know when an edit/modify has completed?
 So an 'edit' fires off a few child processes to do this and that and
 then you forget about them, hoping for them to do a proper job.  Oh,
 this gives me confidence ;)
 
 You'd rather the filesystem guess application state than have the app
 tell it?  Weird.  Your other alternative -- saving a history of every
 write -- doesn't work because you can't tell what point in time is safe
 to restore to.
 
 No, seriously, let's look at some applications:

 A. User works in Office (Star-Office, sure!) and clicks 'Save' for a
 current work before making major modifications. So the last state of
 the document (odt) is being stored. Currently we can set some Backup
 option to be done regularly. Meaning that the backup could have
 happened at the very wrong moment; while saving the state on each user
 request 'Save' is much better.
 
 So modify the office suite to call a new syscall that says I'm
 internally consistent in all these files and boom, the filesystem can
 now take a useful snapshot.
 
 B. A bunch of e-mails are read from the Inbox and stored locally
 (think Maildir). The user sees the sender, doesn't know her, and
 deletes all of them. Of course, the deletion process will have fired
 up the CDP-engine ('event') and retire the files instead of deletion.
 So when the sender calls, and the user learns that he made a big
 mistake, he can roll back to before the deletion (event).
 
 Now think of an application like this but which uses, say, SQLite (e.g.,
 Firefox 3.x, Thunderbird, ...).  The app might never close the database
 file, just fsync() once in a while.  The DB might have multiple files
 (in the SQLite case that might be multiple DBs ATTACHed into one
 database connection).  Also, an fsync of a SQLite journal file is not
 as useful to CDP as an fsync() of a SQLite DB proper.  Now add any of a
 large number of databases and apps to the mix and forget it -- the
 heuristics become impossible or mostly useless.
 
 C. (Sticking with /home/) I agree with you, that the rather continuous
 changes of the dot-files and dot-directories in the users HOME that
 serve JDS, and many more, do eventually not necessarily allow to
 reconstitute a valid state of the settings at all and any moment.
 Still, chances are high, that they will. In the worst case, the
 
 Chances?  So what, we tell the user try restoring to this snapshot,
 login again and if stuff doesn't work, then try another snapshot?  What
 if the user discovers too late that the selected snapshot was
 inconsistent and by then they've made other changes?
 
 unlucky user can roll back to when he last took a break, if only for
 grabbing another coffee, because it took a minute, the writes (see
 
 That sounds mighty painful.
 
 I'd rather modify some high-profile apps to tell the filesystem that
 their state is consistent, so take a snapshot.
 
 Maybe SUN can't do this, but wait for Apple, and OSX10-dot-something
 (using ZFS as default!) will know how to do it. (And they probably
 also know, when their 'writes' are done.)
 
 I'm giving you the best answer -- modify the apps -- and you reject it.
 Given how many important apps Apple controls it wouldn't surprise me if
 they did what I suggest.  We should do it too.  But one step at a time.
 We need to setup a project, gather requirements, design a solution, ...
 And since the solution will almost certainly entail modifications to
 apps where heuristics won't help, well, I think this would be a project
 with fairly wide scope, which means it likely won't go fast.
 
 Nico
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance with Sun StorageTek 2540

2008-02-26 Thread Cyril Plisko
On Wed, Feb 27, 2008 at 6:17 AM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
 On Sun, 17 Feb 2008, Mertol Ozyoney wrote:

   Hi Bob;
  
   When you have some spare time can you prepare a simple benchmark report in
   PDF that I can share with my customers to demonstrate the performance of
   2540 ?

  While I do not claim that it is simple I have created a report on my
  configuration and experience.  It should be useful for users of the
  Sun StorageTek 2540, ZFS, and Solaris 10 multipathing.

  See

  
 http://www.simplesystems.org/users/bfriesen/zfs-discuss/2540-zfs-performance.pdf

Nov 26, 2008 ??? May I borrow your time machine ? ;-)

-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss