Re: [zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

2012-01-16 Thread Matthew Ahrens
On Mon, Jan 16, 2012 at 11:34 AM, Jim Klimov  wrote:

> 2012-01-16 23:14, Matthew Ahrens пишет:
>
>> On Thu, Jan 12, 2012 at 5:00 PM, Jim Klimov > > wrote:
>>
>>While reading about zfs on-disk formats, I wondered once again
>>why is it not possible to create a snapshot on existing data,
>>not of the current TXG but of some older point-in-time?
>>
>>
>> It is not possible because the older data may no longer exist on-disk.
>>  For example, you want to take a snapshot from 10 txg's ago.  But since
>> then we have created a new file, which modified the containing
>> directory.  So we freed the directory block from 10 txg's ago.  That
>> freed block is then a candidate for reallocation.
>>
>> Existence of old uberblocks in the ring buffer does not indicate that
>> the data they reference is still valid.  This is the reason that "zpool
>> import -F" does not always work.
>>
>
> Hmmm... the way I got it (but again have no prooflinks handy)
> was that ZFS "recently" got a deferred-reuse feature to just
> guarantee those rollbacks, basically. I am not sure which
> builds or distros that might be included in.
>
> If you authoritatively say it's not there (or not in illumos),
> I'm going to trust you ;)
>

It's definitely not there in Illumos.  See TXG_DEFER_SIZE.  There was talk
of changing it at Oracle, don't know if that ever happened.  If you have a
S11 system you could probably use mdb to look at the size of the
ms_defermap.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

2012-01-16 Thread Jim Klimov

2012-01-16 23:14, Matthew Ahrens пишет:

On Thu, Jan 12, 2012 at 5:00 PM, Jim Klimov mailto:jimkli...@cos.ru>> wrote:

While reading about zfs on-disk formats, I wondered once again
why is it not possible to create a snapshot on existing data,
not of the current TXG but of some older point-in-time?


It is not possible because the older data may no longer exist on-disk.
  For example, you want to take a snapshot from 10 txg's ago.  But since
then we have created a new file, which modified the containing
directory.  So we freed the directory block from 10 txg's ago.  That
freed block is then a candidate for reallocation.

Existence of old uberblocks in the ring buffer does not indicate that
the data they reference is still valid.  This is the reason that "zpool
import -F" does not always work.


Hmmm... the way I got it (but again have no prooflinks handy)
was that ZFS "recently" got a deferred-reuse feature to just
guarantee those rollbacks, basically. I am not sure which
builds or distros that might be included in.

If you authoritatively say it's not there (or not in illumos),
I'm going to trust you ;)

What about injecting snapshots into static data - before at
least one existing snapshot? Is that possible? I do get your
point about missing older directory data and possible invalidity
of the snapshot as a ZPL dataset (and probably a bad basis for
a writeable clone)... but let's call them checkpoints then, and
limit use for zfs send and fencing of erred ranges ;)

Is that technically possible or logically reasonable?

Thanks,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

2012-01-16 Thread Matthew Ahrens
On Thu, Jan 12, 2012 at 5:00 PM, Jim Klimov  wrote:

> While reading about zfs on-disk formats, I wondered once again
> why is it not possible to create a snapshot on existing data,
> not of the current TXG but of some older point-in-time?
>

It is not possible because the older data may no longer exist on-disk.  For
example, you want to take a snapshot from 10 txg's ago.  But since then we
have created a new file, which modified the containing directory.  So we
freed the directory block from 10 txg's ago.  That freed block is then a
candidate for reallocation.

Existence of old uberblocks in the ring buffer does not indicate that the
data they reference is still valid.  This is the reason that "zpool import
-F" does not always work.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

2012-01-13 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
> Perhaps I need to specify some usecases more clearly:

Actually, I'm not sure you do need to specify usecases more clearly -
Because the idea is obviously awesome.  The main problem, if you're
interested, is getting attention.  Maybe it's more work than I know, but I
agree with you, at first blush it doesn't sound like much work.  

I think the most compelling use case you mentioned was ability to resume
interrupted zfs send.

It's one of those things where it's not super-super useful (most people are
content with whatever snapshot and zfs send scheme they already have today)
but if it's not much work, then maybe it's worth while anyway.  

But there's a finite amount of development resource.  And other features
that are in higher demand (such as BP rewrite, etc).  Why would oracle or
nexenta care about devoting the effort?  Maybe it's possible, maybe there
just isn't enough motivation...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

2012-01-12 Thread Jim Klimov

2012-01-13 7:26, Steve Gonczi wrote:

JIm,


Any modified block (in absence of a snaphot) gets re-written
to a new location and the original block is freed.

So the earlier state you want to go back and snapshot is no longer there,

The essence of taking a snapshot is keeping the original blocks
instead of freeing them.


Perhaps I need to specify some usecases more clearly:

1) Snapshot added in-between existing snapshots, or even
   before the first one currently existing, i.e. just to
   facilitate incremental snapshot sends in small chunks
   over lousy media (where zfs send is likely to never
   succeed for huge datasets sent as one initial stream).

2) Cloning and/or rollback of a dataset at some point in
   time (TXG number) of which I forgot to add a timely
   snapshot of. Apparently, this would only work to ignore
   added data, since overwritten blocks would be lost.

   Exception: there is a "last" chance to reference last
   32-128 TXGs, uberblocks for which still exist in the
   ring. Say, 128*5sec = 640 sec > 10.5 min of rollback
   info guaranteed to be not overwritten by ZFS COW.
   This would compensate most of those "Oh sh*t what
   have I done!?" moments of operator/admin errors,
   typos, etc. Injecting a snapshot into "3 minutes ago"
   would help retain that data not-actually-deleted
   from disk while you go about repairing damage ;)

   Perhaps this would even allow for undeletion of datasets
   which you never intended to destroy (notably, I had
   LU BE deletion trying to kill off my zone datasets
   some time around snv_101 or so; they were only saved
   by being mounted and running at the time).

3) Use along with that proposed replacement of existing
   snapshots (with degraded unreadable blocks) while
   maintaining the rest of snapshot/clone tree. If this
   "technology" were to be implemented, injected snaps
   could naturally be used to "fence off" the corrupted
   area (TXG number range) and replace the resulting
   smaller corrupt snapshot with good data from another
   storage.

   I hope it is not theoretically impossible to write
   this replacement snapshot in such a manner that the
   resulting sequence of block histories would still
   make sense as valid files. This block reallocation
   is not much different from autorepairs on resilver
   or scrub... I think :)

Thanks,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss