Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-11 Thread David Dyer-Bennet

On Tue, August 10, 2010 23:13, Ian Collins wrote:
 On 08/11/10 03:45 PM, David Dyer-Bennet wrote:

 cannot receive incremental stream: most recent snapshot of
 bup-wrack/fsfs/zp1/ddb does not
 match incremental source

 That last error occurs if the snapshot exists, but has changed, it has
 been deleted and a new one with the same name created.

So for testing purposes at least, I need to shut down everything I have
that creates or deletes snapshots.  (I don't, though, have anything that
would delete one and create one with the same name.  I create snapshots
with various names (2hr, daily, weekly, monthly, yearly) and a current
timestamp, and I delete old ones (many days old at a minimum).)

And I think I'll abstract the commands from my backup script into a
simpler dedicated test script, so I'm sure I'm doing exactly the same
thing each time (that should cause me to hit on a combination that works
right away :-) ).

Is there anything stock in b134 that messes with snapshots that I should
shut down to keep things stable, or am I only worried about my own stuff?

Are other people out there not using send/receive for backups?  Or not
trying to preserve snapshots while doing it?  Or, are you doing what I'm
doing, and not having the problems I'm having?
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-11 Thread David Dyer-Bennet

On Tue, August 10, 2010 16:41, Dave Pacheco wrote:
 David Dyer-Bennet wrote:

 If that turns out to be the problem, that'll be annoying to work around
 (I'm making snapshots every two hours and deleting them after a couple
 of
 weeks).  Locks between admin scripts rarely end well, in my experience.
 But at least I'd know what I had to work around.

 Am I looking for too much here?  I *thought* I was doing something that
 should be simple and basic and frequently used nearly everywhere, and
 hence certain to work.  What could go wrong?, I thought :-).  If I'm
 doing something inherently dicey I can try to find a way to back off; as
 my primary backup process, this needs to be rock-solid.


 It's certainly a reasonable thing to do and it should work.  There have
 been a few problems around deleting and renaming snapshots as they're
 being sent, but the delete issues were fixed in build 123 by having
 zfs_send hold snapshots being sent (as long as you've upgraded your pool
 past version 18), and it sounds like you're not doing renames, so your
 problem may be unrelated.

AHA!  You may have nailed the issue -- I've upgraded from 111b to 134, but
have not yet upgraded my pool.  Checking...yes, the pool I'm sending from
is V14.  (I don't instantly upgrade pools; I need to preserve the option
of falling back to older software for a while after an upgrade.)

So, I should try either turning off my snapshot creator/deleter during the
backup, or upgrade the pool.  Will do!  (I will eventually upgrade the
pool of course, but I think I'll try the more reversible option first.  I
can have the deleter check for the pid file the backup already creates to
avoid two backups running at once.)

Thank you very much!  This is extremely encouraging.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-11 Thread Paul Kraus
On Wed, Aug 11, 2010 at 10:36 AM, David Dyer-Bennet d...@dd-b.net wrote:

 On Tue, August 10, 2010 16:41, Dave Pacheco wrote:
 David Dyer-Bennet wrote:

 If that turns out to be the problem, that'll be annoying to work around
 (I'm making snapshots every two hours and deleting them after a couple
 of
 weeks).  Locks between admin scripts rarely end well, in my experience.
 But at least I'd know what I had to work around.

I've had good luck with locks (eventually), but they are not trivial
if you want them to be robust. It usually takes a bunch of trial and
error for me.

 Am I looking for too much here?  I *thought* I was doing something that
 should be simple and basic and frequently used nearly everywhere, and
 hence certain to work.  What could go wrong?, I thought :-).  If I'm
 doing something inherently dicey I can try to find a way to back off; as
 my primary backup process, this needs to be rock-solid.

It looks like you are trying to do a full send every time, what about
a first full then incremental (which should be much faster) ? The
first full might run afoul of the 2 hour snapshots (and deletions),
but I would not expect the incremental to. I am syncing about 20 TB of
data between sites this way every 4 hours over a 100 Mb link. I put
the snapshot management and the site to site replication in the same
script to keep them from fighting :-)

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-10 Thread David Dyer-Bennet
My full backup still doesn't complete.  However, instead of hanging the
entire disk subsystem as it did on 111b, it now issues error messages. 
Errors at the end.

sending from @bup-daily-20100726-10CDT to
zp1/d...@bup-daily-20100727-10cdt
received 3.80GB stream in 136 seconds (28.6MB/sec)
receiving incremental stream of zp1/d...@bup-daily-20100727-10cdt into
bup-wrack/fsfs/z
p1/d...@bup-daily-20100727-10cdt
sending from @bup-daily-20100727-10CDT to
zp1/d...@bup-daily-20100728-11cdt
received 192MB stream in 10 seconds (19.2MB/sec)
receiving incremental stream of zp1/d...@bup-daily-20100728-11cdt into
bup-wrack/fsfs/z
p1/d...@bup-daily-20100728-11cdt
sending from @bup-daily-20100728-11CDT to
zp1/d...@bup-daily-20100729-10cdt
received 170MB stream in 9 seconds (18.9MB/sec)
receiving incremental stream of zp1/d...@bup-daily-20100729-10cdt into
bup-wrack/fsfs/z
p1/d...@bup-daily-20100729-10cdt
sending from @bup-daily-20100729-10CDT to
zp1/d...@bup-2hr-20100729-22cdt
warning: cannot send 'zp1/d...@bup-2hr-20100729-22cdt': no such pool or
dataset
sending from @bup-2hr-20100729-22CDT to
zp1/d...@bup-2hr-20100730-00cdt
warning: cannot send 'zp1/d...@bup-2hr-20100730-00cdt': no such pool or
dataset
sending from @bup-2hr-20100730-00CDT to
zp1/d...@bup-2hr-20100730-02cdt
warning: cannot send 'zp1/d...@bup-2hr-20100730-02cdt': no such pool or
dataset
sending from @bup-2hr-20100730-02CDT to
zp1/d...@bup-2hr-20100730-04cdt
warning: cannot send 'zp1/d...@bup-2hr-20100730-04cdt': incremental
source (@bup-2hr-20
100730-02CDT) does not exist
sending from @bup-2hr-20100730-04CDT to
zp1/d...@bup-2hr-20100730-06cdt
sending from @bup-2hr-20100730-06CDT to
zp1/d...@bup-2hr-20100730-08cdt
sending from @bup-2hr-20100730-08CDT to
zp1/d...@bup-daily-20100730-10cdt
sending from @bup-daily-20100730-10CDT to
zp1/d...@bup-2hr-20100730-10cdt
sending from @bup-2hr-20100730-10CDT to
zp1/d...@bup-2hr-20100730-12cdt
sending from @bup-2hr-20100730-12CDT to
zp1/d...@bup-2hr-20100730-14cdt
sending from @bup-2hr-20100730-14CDT to
zp1/d...@bup-2hr-20100730-16cdt
sending from @bup-2hr-20100730-16CDT to
zp1/d...@bup-2hr-20100730-18cdt
sending from @bup-2hr-20100730-18CDT to
zp1/d...@bup-2hr-20100730-20cdt
sending from @bup-2hr-20100730-20CDT to
zp1/d...@bup-2hr-20100730-22cdt
received 162MB stream in 9 seconds (18.0MB/sec)
receiving incremental stream of zp1/d...@bup-2hr-20100730-06cdt into
bup-wrack/fsfs/zp1
/d...@bup-2hr-20100730-06cdt
cannot receive incremental stream: most recent snapshot of
bup-wrack/fsfs/zp1/ddb does not
match incremental source
bash-4.0$

The bup-wrack pool was newly-created, empty, before this backup started.

The backup commands were:

zfs send -Rv $srcsnap | zfs recv -Fudv $BUPPOOL/$HOSTNAME/$FS

I don't see how anything could be creating snapshots on bup-wrack while
this was running.  That pool is not normally mounted (it's on a single
external USB drive, I plug it in for backups).  My script for doing
regular snapshots of zp1 and rpool doesn't reference any of the bup-*
pools.

I don't see how this snapshot mismatch can be coming from anything but the
send/receive process.

There are quite a lot of snapshots; dailys for some months, 2-hour ones
for a couple of weeks.  Most of them are empty or tiny.

Next time I will try WITHOUT -v on both ends, and arrange to capture the
expanded version of the command with all the variables filled in, but I
don't expect any different outcome.

Any other ideas?







-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-10 Thread David Dyer-Bennet
Additional information.  I started another run, and captured the exact
expanded commands.  These SHOULD BE the exact commands used in the last
run except for the snapshot name (this script makes a recursive snapshot
just before it starts a backup).  In any case they ARE the exact commands
used in this new run, and we'll see what happens at the end of this run.

(These are from a bash trace as produced by set -x)

+ zfs create -p bup-wrack/fsfs/zp1
+ zfs send -Rp z...@bup-20100810-154542gmt
+ zfs recv -Fud bup-wrack/fsfs/zp1

(The send and the receive are source and sink in a pipeline).  As you can
see, the destination filesystem is new in the bup-wrack pool.  The -R on
the send should, as I understand it, create a replication stream which
will replicate  the specified filesystem, and all descendent file
systems, up to the  named  snapshot.  When received, all properties,
snapshots, descendent file systems, and clones are preserved.  This
should send the full state of zp1 up to the snapshot.  And the receive
should receive it into bup-wrack/fsfs/zp1.)

Isn't this how a full backup should be made using zfs send/receive? 
(Once this is working, I think intend to use -I to send incremental
streams to update it regularly.)

bash-4.0$ zpool list
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
bup-wrack   928G  4.62G   923G 0%  1.00x  ONLINE  /backups/bup-wrack
rpool   149G  10.0G   139G 6%  1.00x  ONLINE  -
zp11.09T   743G   373G66%  1.00x  ONLINE  -

zp1 is my primary data pool.  It's not very big (physically it's 3 2-way
mirrors of 400GB drives).  It has 743G of data in it.  bup-wrack is the
backup pool, it's a single 1TB external USB drive.  This was taken shortly
after starting the second try at a full backup (since the b134 upgrade),
so bup-wrack is still mostly empty.

None of the pools have shown any errors of any sort in months.  zp1 and
rpool are scrubbed weekly.





-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-10 Thread Dave Pacheco

David Dyer-Bennet wrote:

My full backup still doesn't complete.  However, instead of hanging the
entire disk subsystem as it did on 111b, it now issues error messages. 
Errors at the end.

[...]

cannot receive incremental stream: most recent snapshot of
bup-wrack/fsfs/zp1/ddb does not
match incremental source
bash-4.0$

The bup-wrack pool was newly-created, empty, before this backup started.

The backup commands were:

zfs send -Rv $srcsnap | zfs recv -Fudv $BUPPOOL/$HOSTNAME/$FS

I don't see how anything could be creating snapshots on bup-wrack while
this was running.  That pool is not normally mounted (it's on a single
external USB drive, I plug it in for backups).  My script for doing
regular snapshots of zp1 and rpool doesn't reference any of the bup-*
pools.

I don't see how this snapshot mismatch can be coming from anything but the
send/receive process.

There are quite a lot of snapshots; dailys for some months, 2-hour ones
for a couple of weeks.  Most of them are empty or tiny.

Next time I will try WITHOUT -v on both ends, and arrange to capture the
expanded version of the command with all the variables filled in, but I
don't expect any different outcome.

Any other ideas?



Is it possible that snapshots were renamed on the sending pool during 
the send operation?


-- Dave


--
David Pacheco, Sun Microsystems Fishworks. http://blogs.sun.com/dap/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-10 Thread Dave Pacheco

David Dyer-Bennet wrote:

On Tue, August 10, 2010 13:23, Dave Pacheco wrote:

David Dyer-Bennet wrote:

My full backup still doesn't complete.  However, instead of hanging the
entire disk subsystem as it did on 111b, it now issues error messages.
Errors at the end.

[...]

cannot receive incremental stream: most recent snapshot of
bup-wrack/fsfs/zp1/ddb does not
match incremental source
bash-4.0$

The bup-wrack pool was newly-created, empty, before this backup started.

The backup commands were:

zfs send -Rv $srcsnap | zfs recv -Fudv $BUPPOOL/$HOSTNAME/$FS

I don't see how anything could be creating snapshots on bup-wrack while
this was running.  That pool is not normally mounted (it's on a single
external USB drive, I plug it in for backups).  My script for doing
regular snapshots of zp1 and rpool doesn't reference any of the bup-*
pools.

I don't see how this snapshot mismatch can be coming from anything but
the
send/receive process.

There are quite a lot of snapshots; dailys for some months, 2-hour ones
for a couple of weeks.  Most of them are empty or tiny.

Next time I will try WITHOUT -v on both ends, and arrange to capture the
expanded version of the command with all the variables filled in, but I
don't expect any different outcome.

Any other ideas?


Is it possible that snapshots were renamed on the sending pool during
the send operation?


I don't have any scripts that rename a snapshot (in fact I didn't know it
was possible until just now), and I don't have other users with permission
to make snapshots (either delegated or by root access).  I'm not using the
Sun auto-snapshot thing, I've got a much-simpler script of my own (hence I
know what it does).  So I don't at the moment see how one would be getting
renamed.

It's possible that a snapshot was *deleted* on the sending pool during the
send operation, however.  Also that snapshots were created (however, a
newly created one would be after the one specified in the zfs send -R, and
hence should be irrelevant).  (In fact it's certain that snapshots were
created and I'm nearly certain of deleted.)

If that turns out to be the problem, that'll be annoying to work around
(I'm making snapshots every two hours and deleting them after a couple of
weeks).  Locks between admin scripts rarely end well, in my experience. 
But at least I'd know what I had to work around.


Am I looking for too much here?  I *thought* I was doing something that
should be simple and basic and frequently used nearly everywhere, and
hence certain to work.  What could go wrong?, I thought :-).  If I'm
doing something inherently dicey I can try to find a way to back off; as
my primary backup process, this needs to be rock-solid.



It's certainly a reasonable thing to do and it should work.  There have 
been a few problems around deleting and renaming snapshots as they're 
being sent, but the delete issues were fixed in build 123 by having 
zfs_send hold snapshots being sent (as long as you've upgraded your pool 
past version 18), and it sounds like you're not doing renames, so your 
problem may be unrelated.


-- Dave

--
David Pacheco, Sun Microsystems Fishworks. http://blogs.sun.com/dap/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-10 Thread David Dyer-Bennet

On 10-Aug-10 13:46, David Dyer-Bennet wrote:


On Tue, August 10, 2010 13:23, Dave Pacheco wrote:

David Dyer-Bennet wrote:

My full backup still doesn't complete.  However, instead of hanging the
entire disk subsystem as it did on 111b, it now issues error messages.
Errors at the end.

[...]

cannot receive incremental stream: most recent snapshot of
bup-wrack/fsfs/zp1/ddb does not
match incremental source
bash-4.0$

The bup-wrack pool was newly-created, empty, before this backup started.

The backup commands were:

zfs send -Rv $srcsnap | zfs recv -Fudv $BUPPOOL/$HOSTNAME/$FS

I don't see how anything could be creating snapshots on bup-wrack while
this was running.  That pool is not normally mounted (it's on a single
external USB drive, I plug it in for backups).  My script for doing
regular snapshots of zp1 and rpool doesn't reference any of the bup-*
pools.

I don't see how this snapshot mismatch can be coming from anything but
the
send/receive process.

There are quite a lot of snapshots; dailys for some months, 2-hour ones
for a couple of weeks.  Most of them are empty or tiny.

Next time I will try WITHOUT -v on both ends, and arrange to capture the
expanded version of the command with all the variables filled in, but I
don't expect any different outcome.

Any other ideas?



Is it possible that snapshots were renamed on the sending pool during
the send operation?


I don't have any scripts that rename a snapshot (in fact I didn't know it
was possible until just now), and I don't have other users with permission
to make snapshots (either delegated or by root access).  I'm not using the
Sun auto-snapshot thing, I've got a much-simpler script of my own (hence I
know what it does).  So I don't at the moment see how one would be getting
renamed.

It's possible that a snapshot was *deleted* on the sending pool during the
send operation, however.  Also that snapshots were created (however, a
newly created one would be after the one specified in the zfs send -R, and
hence should be irrelevant).  (In fact it's certain that snapshots were
created and I'm nearly certain of deleted.)


More information.  The test I started this morning errored out somewhat 
similarly, and one set of errors is clearly deleted snapshots (they're 
2hr snapshots that some of get deleted every 2 hours).  There are also 
errors relating to incremental streams which is strange since I'm not 
using -I or -i at all.


Here are the commands again, and all the output.

+ zfs create -p bup-wrack/fsfs/zp1
+ zfs send -Rp z...@bup-20100810-154542gmt
+ zfs recv -Fud bup-wrack/fsfs/zp1
warning: cannot send 'zp1/d...@bup-2hr-20100731-12cdt': no such pool 
or dataset
warning: cannot send 'zp1/d...@bup-2hr-20100731-14cdt': no such pool 
or dataset
warning: cannot send 'zp1/d...@bup-2hr-20100731-16cdt': no such pool 
or dataset
warning: cannot send 'zp1/d...@bup-20100731-213303gmt': incremental 
source (@bup-2hr-20100731-16CDT) does not exist
warning: cannot send 'zp1/d...@bup-2hr-20100731-18cdt': no such pool 
or dataset
warning: cannot send 'zp1/d...@bup-2hr-20100731-20cdt': incremental 
source (@bup-2hr-20100731-18CDT) does not exist
cannot receive incremental stream: most recent snapshot of 
bup-wrack/fsfs/zp1/ddb does not

match incremental source

Afterward,

bash-4.0$ zpool list
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
bup-wrack   928G   687G   241G73%  1.00x  ONLINE  /backups/bup-wrack
rpool   149G  10.0G   139G 6%  1.00x  ONLINE  -
zp11.09T   743G   373G66%  1.00x  ONLINE  -

So quite a lot did get transferred; but not all.

So, it appears clear that snapshots being deleted during the zfs send -R 
causes a warning.  A warning is fine, since they're not there it can't 
send them, and they were there when the command was given so it makes 
sense for it to try.


That last message, which is not tagged as either warning or error, 
worries me though.  And wondering how complete the transfer is; I 
believe the backup copy is compressed whereas the zp1 copy isn't, so the 
ALLOC being that different isn't clear-cut evidence of anything.


I'll try to guess a few things that should be recent and see if they in 
fact got into the backup.


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-10 Thread Ian Collins

On 08/11/10 03:45 PM, David Dyer-Bennet wrote:

On 10-Aug-10 13:46, David Dyer-Bennet wrote:


It's possible that a snapshot was *deleted* on the sending pool 
during the

send operation, however.  Also that snapshots were created (however, a
newly created one would be after the one specified in the zfs send 
-R, and

hence should be irrelevant).  (In fact it's certain that snapshots were
created and I'm nearly certain of deleted.)


More information.  The test I started this morning errored out 
somewhat similarly, and one set of errors is clearly deleted snapshots 
(they're 2hr snapshots that some of get deleted every 2 hours).  There 
are also errors relating to incremental streams which is strange 
since I'm not using -I or -i at all.


Here are the commands again, and all the output.

+ zfs create -p bup-wrack/fsfs/zp1
+ zfs send -Rp z...@bup-20100810-154542gmt
+ zfs recv -Fud bup-wrack/fsfs/zp1
warning: cannot send 'zp1/d...@bup-2hr-20100731-12cdt': no such 
pool or dataset
warning: cannot send 'zp1/d...@bup-2hr-20100731-14cdt': no such 
pool or dataset
warning: cannot send 'zp1/d...@bup-2hr-20100731-16cdt': no such 
pool or dataset
warning: cannot send 'zp1/d...@bup-20100731-213303gmt': incremental 
source (@bup-2hr-20100731-16CDT) does not exist
warning: cannot send 'zp1/d...@bup-2hr-20100731-18cdt': no such 
pool or dataset
warning: cannot send 'zp1/d...@bup-2hr-20100731-20cdt': incremental 
source (@bup-2hr-20100731-18CDT) does not exist
cannot receive incremental stream: most recent snapshot of 
bup-wrack/fsfs/zp1/ddb does not

match incremental source

That last error occurs if the snapshot exists, but has changed, it has 
been deleted and a new one with the same name created.


That last message, which is not tagged as either warning or error, 
worries me though.  And wondering how complete the transfer is; I 
believe the backup copy is compressed whereas the zp1 copy isn't, so 
the ALLOC being that different isn't clear-cut evidence of anything.



It probably aborted the send.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss