Re: [Bacula-users] How to backup structured fodler tree (avoiding incremental as much as possible)?

2019-05-31 Thread Dimitri Maziuk via Bacula-users
On 5/31/19 3:24 PM, Lloyd Brown wrote:

> Use a script to generate the list of directories to backup, and shove
> that into the FileSet definition.

E.g. this:

FileSet {
...
Include {
File = "\\|sh -c 'find /home \\( -name fid -o -name ser \\)
-mmin +60 -exec dirname \\{} \\; | sort | uniq'"
...
}
}

works (watch out for wordwrap), but you lose timestamps on all
intermediate directories. As in you can't see those when selecting files
for restores.

Unless they fixed it in v. 9.someting -- we're still on 7.x.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] How to backup structured fodler tree (avoiding incremental as much as possible)?

2019-05-31 Thread Lloyd Brown
On 5/28/19 1:51 AM, Alexandre wrote:
> I have a product I need to backup data for.
> It stores data in a "contentstore" which is structured in a way that
> make it very predictable what needs to be backed up every day.
> Let me explain, every time a content is created it is stored under
> this specific hierarchy:
>
> contentstore_root
>   |_ (e.g. 2019)
>     |_mm (1...12)
>   |_dd (1...31)
>     |_HH (1...24)
>           |_MM (1...59)
> ...


Alexandre,


I do have an idea.  I don't know if it will help you or not.  It
definitely needs to be experimented with, and vetted.


Here we go:

Use a script to generate the list of directories to backup, and shove
that into the FileSet definition. Your script could just specify the
current date's directories, and those of some number of days before and
after the current date (eg +/- 5 days).  Then those would be the only
directories that the bacula FD would traverse, to do the backup on.

Here's an example of the fileset syntax that includes a script that
outputs the newline-separated list of files:

> FileSet {
>     Name = "FilesetName"
>     Include {
>     Options {
>     signature = SHA1
>     compression = GZIP
>     }
>     File = "\\|/path/to/script/on/fd"
>     }
> }

In this scenario, since FileSet definition doesn't actually change, it
won't upgrade your incremental to a full, for example, as long as an
older full exists.  Yes, the output of the script itself does change,
but that doesn't seem to be enough to trigger the automatic level upgrade.

There are a few enormous questions, though:

- What happens when a previously-backed-up directory, is no longer
listed in the output of the script?  Do those files remain available to
restore, or are they treated as if they were deleted?

- Does the size of your ongoing VirtualFull keep growing over time? Is
that going to be a problem?

- Is there ever a time when you no longer want to be able to restore old
files? Do you need to have some kind of manual file expiration mechanism?

- Would it make sense to sometimes include old directories, after
they've been emptied, so that your job can figure out that the files are
moved to the trash folder instead?  For example, if files are trashed
after, say, 90 days, then your script could output the list of new
directories to backup, and a list of 5 directories that are 90-95 days old.




Good luck,

Lloyd


-- 
Lloyd Brown
HPC Systems Administrator
Office of Research Computing
Brigham Young University
http://marylou.byu.edu



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] How to backup structured fodler tree (avoiding incremental as much as possible)?

2019-05-28 Thread Alexandre
I have a product I need to backup data for.
It stores data in a "contentstore" which is structured in a way that make
it very predictable what needs to be backed up every day.
Let me explain, every time a content is created it is stored under this
specific hierarchy:

contentstore_root
  |_ (e.g. 2019)
|_mm (1...12)
  |_dd (1...31)
|_HH (1...24)
  |_MM (1...59)

This directory structure is stored on a GFS filesystem which has proven to
be very poor at traversing long list of files (caveats of all distributed
filesystems I guess). For example doing a `ls -lR` on the GFS mountpoint is
significantly slower than on local extX FS. And this of course has an
impact on the time taken for incremental backups... to the point where
backing up the contentstore root (which containd tens of millions of files)
would take more than half a day.

Also it should be noted that when a file is updated on the system, the old
version stays where it was, and the new version is stored has a new file
using the "new date directory path". When a file is deleted it remains on
the filesystem for a while before it is moved to a trash folder by an
internal job of the application after a "grace period",

As a consequence, given the directory structure and the general behaviour
of the application, I don't feel like incremental backups are really
needed, and Id like to get rid of them if possible to avoid those long
backup I have.
However all the configuration I could come up with have huge drawbacks in
some way or another.
For example daily backup of the folder-of-the-past-day  + monthly backup of
the fodler of the past month makes it very hard and complex to do a
full restore in case of disaster recovery

I'm really interested in knowing if anybody is dealing with a similar kind
of backup and how do they deal with it? Also if you have any idea on what
features coud be helpful in my case (I took a look at the VirtualFull, ut
that doesn't seem to really solve the burden of backup administration)
in shoort, any insight is appreciated.

I'd like to avoid as much as possible having to deal with local (tar)
archiving as we are here talking about tens of TB of data and can't really
afford having twice the space used in order to be able to back it up.

Regards
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users