Re: some suggested config parameters for backups to local disk

2018-03-23 Thread Jon LaBadie
On Fri, Mar 23, 2018 at 09:40:34AM -0400, Austin S. Hemmelgarn wrote:
> On 2018-03-23 08:25, hy...@lactose.homelinux.net wrote:
> > "Ryan, Lyle (US)" writes:
> > 
> > > The server has an 11TB filesystem to store the backups in.  I should
> > > probably be fancier and split this up more, but not now.   So I've got my
> > > holding, state, and vtapes directories all in there.
> > 
> > In this scenario, I would think there's no point to a "holding" disk.
> > 
> > I use a holding disk because my actual backup disk is external-USB and
> > (comparatively) slow.  So I backup to a holding disk on my internal
> > SSD, releasing the client and the network as soon as possible, and then
> > copy the backup to the backup drive afterwards.  But in your case, I
> > don't see any benefit.
> There are two other benefits to having a holding disk:
> 
> 1. It lets you run dumps in parallel.  Without a holding disk (or some
> somewhat complicated setup of the vtapes to allow parallel taping), you can
> only dump one DLE at a time because it dumps directly to tape.
> 
> 2. It lets you defer taping until you have some minimum amount of data ready
> to be taped.  This may sound kind of useless when working with vtapes, but
> if the holding disk is on the same device as the final vtape library,
> deferring until the dumps are all done (or at least, almost all done) can
> help improve dumping performance, because the dump processes won't be
> competing with the taper process for disk bandwidth.
>>> End of included message <<<

3. If something happens to the data storage device(s), the holding
disk (HD) can continue to collect your backups.  My HD is big
enough to hold about 4 typical runs.  Should the storage outage
be protracted and HD space gets low, amanda switches to "degraded"
mode and only does incrementals.

jl
-- 
Jon H. LaBadie j...@jgcomp.com
 11226 South Shore Rd.  (703) 787-0688 (H)
 Reston, VA  20190  (703) 935-6720 (C)


Re: some suggested config parameters for backups to local disk

2018-03-23 Thread Schlacta, Christopher
> Keep in mind that you can pass extra
> options to any compression program you want by using the custom compression
> support and a wrapper script like this:
>
> #!/bin/bash
> /path/to/program --options $@
>
> If you can get it on your distribution, I'd suggest looking into zstandard
> [1] for compression.  The default settings for it compress both better _and_
> faster than the default gzip settings.

According to their own website, https://facebook.github.io/zstd/, they
have the best compression ratios, however, lz4 provides the fastest
compression and decompression times with still competitive ratios.
The point is:  Optimize for the attribute *you* need more.  a faster
algorithm means you can spend less time in compression, a higher ratio
means you'll spend less space on disk (Obviously), so pick the
algorithm with the correct balance...

Also bear in mind that some data types (Images, auido, video, etc.)
are largely incompressible.  I don't recall if you've said what you're
backing up, but in these cases, it's usually better to take one
super-fast pass to zip up the metadata and not dwell on ratios much.

Finally, consider if you have mixed DLEs, for example, one storing
computed tomography results and another storing raw patient data, you
can use different algorithms on them, such as lz4 for improved speed
on the CT images, and Zstd for the higher compression on the patient
data


Re: some suggested config parameters for backups to local disk

2018-03-23 Thread Gene Heskett
On Friday 23 March 2018 08:01:30 Austin S. Hemmelgarn wrote:

> On 2018-03-22 19:03, Ryan, Lyle (US) wrote:
> > I've got an Amanda 3.4.5 server running on Centos 7 now, and am able
> > to do rudimentary backups of a remote client.
> >
> > But in spite of reading man pages, HowTo's, etc, I need help
> > choosing config params.  I don't mind continuing to read and
> > experiment, but if someone could get me at least in the ballpark,
> > I'd really appreciate it.
> >
> > The server has an 11TB filesystem to store the backups in.  I should
> > probably be fancier and split this up more, but not now.   So I've
> > got my holding, state, and vtapes directories all in there.
> >
> > The main client I want to back up has 4TB I want to backup.  It's
> > almost all in one filesystem, but the HowTo for splitting DLE's with
> > exclude lists is clear, so it should be easy to split this into
> > (say) 10 smaller individual dumps.  The bulk of the data is pretty
> > static, maybe 10%/month changes.  It's hard to imagine 20%/month
> > changing.
> >
> > For a start, I'd like to get a full done every 2 weeks, and
> > incrementals/differentials on the intervening days.   If I have room
> > to keep 2 fulls (2 complete dumpcycles) that would be great.
>
> Given what you've said, you should have enough room to do so, but only
> if you use compression.  Assuming the rate of change you quote above s
> approximately constant and doesn't result in bumping to a level higher
> than 1, then without compression you will need roughly 4.015TB per
> cycle (4TB for the full backup, ~15.38GB for the incrementals (roughly
> 0.38% change per day for 13 days)), plus 4TB of space for the holding
> disk (because you have to have room for a full backup _there_ prior to
> taping anything).  With compression and assuming you get a compression
> ratio of about 50%, you should actually be able to fit four complete
> cycles (you would need about 2.0075TB per cycle), though if you decide
> you want that I would bump the tapecycle to 60 and the number of slots
> to 60.
>
> > So I'm thinking:
> >
> > - dumpcycle = 14
> >
> > - runspercycle = 0 (default)
> >
> > - tapecycle = 30
> >
> > - runtapes = 1 (default)
> >
> > I'd break the filesystem into 10 pieces, so 400GB each. and make the
> > vtapes 400GB each (with tapetype length) relying on server-side
> > compression to make it fit.
> >
> > The HowTo "Use pigz to speed compression" looks clear, and the DL380
> > G7 isn't doing anything else, so server-side compression sounds
> > good.
> >
> > Any advice on this or better ideas?  Maybe I'm off in left-field.
> >
> > And one bonus question:  I'm assuming Amanda will just make vtapes
> > as necessary, but is there any guidance as to how many vtape slots I
> > should create ahead of time?  If my dumpcycle=14, maybe create 14
> > slots just to make tapes easier to find?
>
> Debra covered the requirements for vtapes, slots, and everything very
> well in her reply, so I won't repeat any of that here.  I do however
> have some other more generic advice I can give based on my own
> experience:
>
> * Make your vtapes as large as possible.  They won't take up any space
> beyond what's stored on them (in storage terminology, they're thinly
> provisioned), so their total 'virtual' size can be far more than your
> actual storage capacity, but if you can make it so that you can always
> fit a full backup on a single vtape, it will make figuring out how
> many vtapes you need easier, and additionally give a slight boost to
> taping performance (because the taper never has to stop to switch to a
> new vtape).  In your case, I'd say stating 5TB for your vtape size is
> reasonable, that would give you some extra room if you suddenly have
> more data without being insanely over-sized.
>
> * Make sure to set a reasonable part_size for your vtapes.  While you
> wouldn't have to worry about splitting dumps if you take my above
> advice about vtape size, using parts has some other performance
> related advantages.  I normally use 1G, but all of my dumps are less
> than 100G in size.  In your case, if you'll have 10 400G dumps, I'd
> probably go for 4G for the part size.
>
> * Match your holding disk chunk size to your vtape's part_size.  I
> have no hard number to back this up, but it appears to provide a
> slight performance improvement while dumping data.
>
> * Don't worry right now about parallelizing the taping process.  It's
> somewhat complicated to get it working right, significantly changes
> how you have to calculate vtape slots and sizes, and will probably not
> provide much benefit unless you're taping to a really fast RAID array
> that does a very good job of handling parallel writes.
>
> * There's essentially zero performance benefit to having your holding
> disk on a separate partition from your final storage unless you have
> it on a completely separate disk.  There are some benefits in terms of
> reliability, but realizing them requires some significant 

Re: some suggested config parameters for backups to local disk

2018-03-23 Thread Austin S. Hemmelgarn

On 2018-03-23 08:25, hy...@lactose.homelinux.net wrote:

"Ryan, Lyle (US)" writes:


The server has an 11TB filesystem to store the backups in.  I should
probably be fancier and split this up more, but not now.   So I've got my
holding, state, and vtapes directories all in there.


In this scenario, I would think there's no point to a "holding" disk.

I use a holding disk because my actual backup disk is external-USB and
(comparatively) slow.  So I backup to a holding disk on my internal
SSD, releasing the client and the network as soon as possible, and then
copy the backup to the backup drive afterwards.  But in your case, I
don't see any benefit.

There are two other benefits to having a holding disk:

1. It lets you run dumps in parallel.  Without a holding disk (or some 
somewhat complicated setup of the vtapes to allow parallel taping), you 
can only dump one DLE at a time because it dumps directly to tape.


2. It lets you defer taping until you have some minimum amount of data 
ready to be taped.  This may sound kind of useless when working with 
vtapes, but if the holding disk is on the same device as the final vtape 
library, deferring until the dumps are all done (or at least, almost all 
done) can help improve dumping performance, because the dump processes 
won't be competing with the taper process for disk bandwidth.


Re: some suggested config parameters for backups to local disk

2018-03-23 Thread hymie
"Ryan, Lyle (US)" writes:

>The server has an 11TB filesystem to store the backups in.  I should
>probably be fancier and split this up more, but not now.   So I've got my
>holding, state, and vtapes directories all in there.

In this scenario, I would think there's no point to a "holding" disk.

I use a holding disk because my actual backup disk is external-USB and
(comparatively) slow.  So I backup to a holding disk on my internal
SSD, releasing the client and the network as soon as possible, and then
copy the backup to the backup drive afterwards.  But in your case, I
don't see any benefit.

(But I'm certainly not an expert, so if somebody contradicts me, then
follow their advice.)

>And one bonus question:  I'm assuming Amanda will just make vtapes as
>necessary, but is there any guidance as to how many vtape slots I should
>create ahead of time?  If my dumpcycle=14, maybe create 14 slots just to
>make tapes easier to find?

If my memory is correct (I set mine up a long time ago), you would be
better off just letting Amanda do what it wants/needs.  That way, you
don't have to worry about permissions / naming conventions / etc.

--hymie!


Re: some suggested config parameters for backups to local disk

2018-03-23 Thread Austin S. Hemmelgarn

On 2018-03-22 19:03, Ryan, Lyle (US) wrote:
I've got an Amanda 3.4.5 server running on Centos 7 now, and am able to 
do rudimentary backups of a remote client.


But in spite of reading man pages, HowTo's, etc, I need help choosing 
config params.  I don't mind continuing to read and experiment, but if 
someone could get me at least in the ballpark, I'd really appreciate it.


The server has an 11TB filesystem to store the backups in.  I should 
probably be fancier and split this up more, but not now.   So I've got 
my holding, state, and vtapes directories all in there.


The main client I want to back up has 4TB I want to backup.  It's almost 
all in one filesystem, but the HowTo for splitting DLE's with exclude 
lists is clear, so it should be easy to split this into (say) 10 smaller 
individual dumps.  The bulk of the data is pretty static, maybe 
10%/month changes.  It's hard to imagine 20%/month changing.


For a start, I'd like to get a full done every 2 weeks, and 
incrementals/differentials on the intervening days.   If I have room to 
keep 2 fulls (2 complete dumpcycles) that would be great.
Given what you've said, you should have enough room to do so, but only 
if you use compression.  Assuming the rate of change you quote above s 
approximately constant and doesn't result in bumping to a level higher 
than 1, then without compression you will need roughly 4.015TB per cycle 
(4TB for the full backup, ~15.38GB for the incrementals (roughly 0.38% 
change per day for 13 days)), plus 4TB of space for the holding disk 
(because you have to have room for a full backup _there_ prior to taping 
anything).  With compression and assuming you get a compression ratio of 
about 50%, you should actually be able to fit four complete cycles (you 
would need about 2.0075TB per cycle), though if you decide you want that 
I would bump the tapecycle to 60 and the number of slots to 60.


So I'm thinking:

- dumpcycle = 14

- runspercycle = 0 (default)

- tapecycle = 30

- runtapes = 1 (default)

I'd break the filesystem into 10 pieces, so 400GB each. and make the 
vtapes 400GB each (with tapetype length) relying on server-side 
compression to make it fit.


The HowTo "Use pigz to speed compression" looks clear, and the DL380 G7 
isn't doing anything else, so server-side compression sounds good.


Any advice on this or better ideas?  Maybe I'm off in left-field.

And one bonus question:  I'm assuming Amanda will just make vtapes as 
necessary, but is there any guidance as to how many vtape slots I should 
create ahead of time?  If my dumpcycle=14, maybe create 14 slots just to 
make tapes easier to find?


Debra covered the requirements for vtapes, slots, and everything very 
well in her reply, so I won't repeat any of that here.  I do however 
have some other more generic advice I can give based on my own experience:


* Make your vtapes as large as possible.  They won't take up any space 
beyond what's stored on them (in storage terminology, they're thinly 
provisioned), so their total 'virtual' size can be far more than your 
actual storage capacity, but if you can make it so that you can always 
fit a full backup on a single vtape, it will make figuring out how many 
vtapes you need easier, and additionally give a slight boost to taping 
performance (because the taper never has to stop to switch to a new 
vtape).  In your case, I'd say stating 5TB for your vtape size is 
reasonable, that would give you some extra room if you suddenly have 
more data without being insanely over-sized.


* Make sure to set a reasonable part_size for your vtapes.  While you 
wouldn't have to worry about splitting dumps if you take my above advice 
about vtape size, using parts has some other performance related 
advantages.  I normally use 1G, but all of my dumps are less than 100G 
in size.  In your case, if you'll have 10 400G dumps, I'd probably go 
for 4G for the part size.


* Match your holding disk chunk size to your vtape's part_size.  I have 
no hard number to back this up, but it appears to provide a slight 
performance improvement while dumping data.


* Don't worry right now about parallelizing the taping process.  It's 
somewhat complicated to get it working right, significantly changes how 
you have to calculate vtape slots and sizes, and will probably not 
provide much benefit unless you're taping to a really fast RAID array 
that does a very good job of handling parallel writes.


* There's essentially zero performance benefit to having your holding 
disk on a separate partition from your final storage unless you have it 
on a completely separate disk.  There are some benefits in terms of 
reliability, but realizing them requires some significant planning (you 
have to figure out exactly what amount of space your holding disk will 
need).


* If you're indexing the backups, store the working index directory (the 
one Amanda actually reads and writes to) on a separate drive from the 
holding disk and final backup