[zfs-discuss] Re: ZFS and question on repetative data migrating to it efficiently...

jason Fri, 02 Feb 2007 22:36:20 -0800

> In general, your backup software should handle making
> incremental dumps, even from a split mirror. What are
> you using to write data to tape? Are you simply
> dumping the whole file system, rather than using
> standard backup software?
> 
We are using Veritas Netbackup 5 MP4.  It is performing a backup of a vxfs 
filesystem that gets mounted from a BCV (business continuance volume) volume 
split off a set of mirrors from an EMC DMX.  The dbf files prior to split are 
placed into hot backup mode so that the files are in a consistent state.  VxFS 
with checkpoints taken can record differing blocks between checkpoints, and 
house them for the ability to mount those checkpoints as read-only or 
read-write, yet the no-data method of taking checkpoints is a feature that 
would record the different blocks that change and could pipe that data to the 
netbackup master server, provided you have the correct netbackup agents that 
can do this.  Our issue is that the supported method of grabbing no-data 
checkpoints on a vxfs filesystem is that this would need to happen on the 
database host itself.  We would like if possible to keep the backups performed 
offhost, and not steal any cpu cycles from the database host.  AFAIK, there is 
not a method that Veritas Netbackup has that can see differences in files on a 
filesystem that gets presented to a host, then backed up on that host, and then 
BCV disks reattached|resynched to their source (hidden from the backup host 
when getting resynched|reattached to source mirrors), then resplit and 
remounted on the backup host for another backup cycle.  Netbackup does not know 
which blocks have then changed and then would treat all the dbf files as 
different from previous backup and backup the whole file = full backup.  I'd 
love to implement an incremental backup for these files, yet don't know which 
agents can do this if the storage is not present all the time on the off-host 
backup server.  Maybe that is not an issue, but would need to research more to 
see if it is.

> ZFS snapshots use a pure copy-on-write model. If you
> have a block containing some data, and you write
> exactly the same data to that block, ZFS will
> allocate a new block for it. (It would be possible to
> change this, but I can't think of many environments
> where detecting duplicate blocks would be
> advantageous, since most synchronization tools won't
> copy duplicate blocks.)
> 
I guess I understand this.  So anytime any new file is created, it will take a 
new block.  There would be no point in copying the same file on top of itself, 
yet I wanted to find an application that could see differences between 2 files, 
and if some bits are identical, do not change them, just change the differing 
bits to get the 2 files equivalent.  But if it goes to the point of rewriting 
the whole file, then no snapshot space saving is accomplished.

> rsync does actually detect unchanged portions of
> files and avoids copying them. However, I'm not sure
> if it also avoids *rewriting* them, so it may not
> help you.
> 
> You also wrote:
> >RMAN can [collect changes at the block level from
> Oracle files], yet that still keeps things
> >down at a DBA level, yet I need to keep this backup
> processing at the SA level.
> 
> This sounds like you have a political problem that
> really should be fixed. Splitting a mirror is not
> sufficient to have an Oracle backup from which you
> can safely restore, so the DBAs must already be
> cooperating with the SAs on backups. Proper use of
> the database backup tools can make the backup window
> shorter and 
> 
> zfs send/receive can be used to back up only changed
> blocks; vxfs also has incremental block-based backup
> available, but the licensing fees may be high.

This is true that our DBAs do support our existing configuration, yet I feel 
that full backups for each backup window are not the fastest method of backup.  
And if you have to backup fully, you need to restore fully to get a database 
back into a previous state.  This is for the method that we have, in that we do 
not do backups directly from the database host.  So our method for restore 
would be to restore the data from tapes back to a BCV disk, and then reverse 
sync those disks (or BCV restore sync) the data back to the original.  This 
would be a time consuming process, yet could be quicker on machines that have a 
vxfs filesystem on them since they could have a checkpoint remounted to be the 
live filesystem.  I failed to mention earlier that we do have some databases 
running on ufs filesystems and rely on this BCV synch|split process to backup 
their data off-host.  So anytime we would need to restore any data from tapes, 
it will be the a long process.  

I'll keep looking into Veritas Netbackup agents and what solutions are 
available for off-host backup of Oracle database files.  Maybe I can spawn an 
Oracle database to read the content of the BCV which gets mounted on the 
Netbackup master server and then also have it run a netbackup oracle agent that 
could scan the database for the changed blocks and write those off, yet I 
believe this would not work for non-vxfs filesystem based BCV volumes.

I'm just trying to find out what options we have that can help us get to an 
incremental way to backup our data, yet still perform this off-host.  Have some 
thumpers on the way in to try to house backed up data instead of BCV to tape, 
as experiencing any media failure with tapes, stretches the backup window.  A 
VTL might be a better solution, yet hoping that a thumper can act like a pseudo 
VTL and do this with some zfs filesystems, so that it could be a higher option 
for recovery instead of needing to rely directly on tapes.

thanx for your insight

This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS and question on repetative data migrating to it efficiently...

Reply via email to