Re: some suggested config parameters for backups to local disk
On Fri, Mar 23, 2018 at 09:40:34AM -0400, Austin S. Hemmelgarn wrote: > On 2018-03-23 08:25, hy...@lactose.homelinux.net wrote: > > "Ryan, Lyle (US)" writes: > > > > > The server has an 11TB filesystem to store the backups in. I should > > > probably be fancier and split this up more, but not now. So I've got my > > > holding, state, and vtapes directories all in there. > > > > In this scenario, I would think there's no point to a "holding" disk. > > > > I use a holding disk because my actual backup disk is external-USB and > > (comparatively) slow. So I backup to a holding disk on my internal > > SSD, releasing the client and the network as soon as possible, and then > > copy the backup to the backup drive afterwards. But in your case, I > > don't see any benefit. > There are two other benefits to having a holding disk: > > 1. It lets you run dumps in parallel. Without a holding disk (or some > somewhat complicated setup of the vtapes to allow parallel taping), you can > only dump one DLE at a time because it dumps directly to tape. > > 2. It lets you defer taping until you have some minimum amount of data ready > to be taped. This may sound kind of useless when working with vtapes, but > if the holding disk is on the same device as the final vtape library, > deferring until the dumps are all done (or at least, almost all done) can > help improve dumping performance, because the dump processes won't be > competing with the taper process for disk bandwidth. >>> End of included message <<< 3. If something happens to the data storage device(s), the holding disk (HD) can continue to collect your backups. My HD is big enough to hold about 4 typical runs. Should the storage outage be protracted and HD space gets low, amanda switches to "degraded" mode and only does incrementals. jl -- Jon H. LaBadie j...@jgcomp.com 11226 South Shore Rd. (703) 787-0688 (H) Reston, VA 20190 (703) 935-6720 (C)
Re: some suggested config parameters for backups to local disk
> Keep in mind that you can pass extra > options to any compression program you want by using the custom compression > support and a wrapper script like this: > > #!/bin/bash > /path/to/program --options $@ > > If you can get it on your distribution, I'd suggest looking into zstandard > [1] for compression. The default settings for it compress both better _and_ > faster than the default gzip settings. According to their own website, https://facebook.github.io/zstd/, they have the best compression ratios, however, lz4 provides the fastest compression and decompression times with still competitive ratios. The point is: Optimize for the attribute *you* need more. a faster algorithm means you can spend less time in compression, a higher ratio means you'll spend less space on disk (Obviously), so pick the algorithm with the correct balance... Also bear in mind that some data types (Images, auido, video, etc.) are largely incompressible. I don't recall if you've said what you're backing up, but in these cases, it's usually better to take one super-fast pass to zip up the metadata and not dwell on ratios much. Finally, consider if you have mixed DLEs, for example, one storing computed tomography results and another storing raw patient data, you can use different algorithms on them, such as lz4 for improved speed on the CT images, and Zstd for the higher compression on the patient data
Re: some suggested config parameters for backups to local disk
On Friday 23 March 2018 08:01:30 Austin S. Hemmelgarn wrote: > On 2018-03-22 19:03, Ryan, Lyle (US) wrote: > > I've got an Amanda 3.4.5 server running on Centos 7 now, and am able > > to do rudimentary backups of a remote client. > > > > But in spite of reading man pages, HowTo's, etc, I need help > > choosing config params. I don't mind continuing to read and > > experiment, but if someone could get me at least in the ballpark, > > I'd really appreciate it. > > > > The server has an 11TB filesystem to store the backups in. I should > > probably be fancier and split this up more, but not now. So I've > > got my holding, state, and vtapes directories all in there. > > > > The main client I want to back up has 4TB I want to backup. It's > > almost all in one filesystem, but the HowTo for splitting DLE's with > > exclude lists is clear, so it should be easy to split this into > > (say) 10 smaller individual dumps. The bulk of the data is pretty > > static, maybe 10%/month changes. It's hard to imagine 20%/month > > changing. > > > > For a start, I'd like to get a full done every 2 weeks, and > > incrementals/differentials on the intervening days. If I have room > > to keep 2 fulls (2 complete dumpcycles) that would be great. > > Given what you've said, you should have enough room to do so, but only > if you use compression. Assuming the rate of change you quote above s > approximately constant and doesn't result in bumping to a level higher > than 1, then without compression you will need roughly 4.015TB per > cycle (4TB for the full backup, ~15.38GB for the incrementals (roughly > 0.38% change per day for 13 days)), plus 4TB of space for the holding > disk (because you have to have room for a full backup _there_ prior to > taping anything). With compression and assuming you get a compression > ratio of about 50%, you should actually be able to fit four complete > cycles (you would need about 2.0075TB per cycle), though if you decide > you want that I would bump the tapecycle to 60 and the number of slots > to 60. > > > So I'm thinking: > > > > - dumpcycle = 14 > > > > - runspercycle = 0 (default) > > > > - tapecycle = 30 > > > > - runtapes = 1 (default) > > > > I'd break the filesystem into 10 pieces, so 400GB each. and make the > > vtapes 400GB each (with tapetype length) relying on server-side > > compression to make it fit. > > > > The HowTo "Use pigz to speed compression" looks clear, and the DL380 > > G7 isn't doing anything else, so server-side compression sounds > > good. > > > > Any advice on this or better ideas? Maybe I'm off in left-field. > > > > And one bonus question: I'm assuming Amanda will just make vtapes > > as necessary, but is there any guidance as to how many vtape slots I > > should create ahead of time? If my dumpcycle=14, maybe create 14 > > slots just to make tapes easier to find? > > Debra covered the requirements for vtapes, slots, and everything very > well in her reply, so I won't repeat any of that here. I do however > have some other more generic advice I can give based on my own > experience: > > * Make your vtapes as large as possible. They won't take up any space > beyond what's stored on them (in storage terminology, they're thinly > provisioned), so their total 'virtual' size can be far more than your > actual storage capacity, but if you can make it so that you can always > fit a full backup on a single vtape, it will make figuring out how > many vtapes you need easier, and additionally give a slight boost to > taping performance (because the taper never has to stop to switch to a > new vtape). In your case, I'd say stating 5TB for your vtape size is > reasonable, that would give you some extra room if you suddenly have > more data without being insanely over-sized. > > * Make sure to set a reasonable part_size for your vtapes. While you > wouldn't have to worry about splitting dumps if you take my above > advice about vtape size, using parts has some other performance > related advantages. I normally use 1G, but all of my dumps are less > than 100G in size. In your case, if you'll have 10 400G dumps, I'd > probably go for 4G for the part size. > > * Match your holding disk chunk size to your vtape's part_size. I > have no hard number to back this up, but it appears to provide a > slight performance improvement while dumping data. > > * Don't worry right now about parallelizing the taping process. It's > somewhat complicated to get it working right, significantly changes > how you have to calculate vtape slots and sizes, and will probably not > provide much benefit unless you're taping to a really fast RAID array > that does a very good job of handling parallel writes. > > * There's essentially zero performance benefit to having your holding > disk on a separate partition from your final storage unless you have > it on a completely separate disk. There are some benefits in terms of > reliability, but realizing them requires some significant
Re: some suggested config parameters for backups to local disk
On 2018-03-23 08:25, hy...@lactose.homelinux.net wrote: "Ryan, Lyle (US)" writes: The server has an 11TB filesystem to store the backups in. I should probably be fancier and split this up more, but not now. So I've got my holding, state, and vtapes directories all in there. In this scenario, I would think there's no point to a "holding" disk. I use a holding disk because my actual backup disk is external-USB and (comparatively) slow. So I backup to a holding disk on my internal SSD, releasing the client and the network as soon as possible, and then copy the backup to the backup drive afterwards. But in your case, I don't see any benefit. There are two other benefits to having a holding disk: 1. It lets you run dumps in parallel. Without a holding disk (or some somewhat complicated setup of the vtapes to allow parallel taping), you can only dump one DLE at a time because it dumps directly to tape. 2. It lets you defer taping until you have some minimum amount of data ready to be taped. This may sound kind of useless when working with vtapes, but if the holding disk is on the same device as the final vtape library, deferring until the dumps are all done (or at least, almost all done) can help improve dumping performance, because the dump processes won't be competing with the taper process for disk bandwidth.
Re: some suggested config parameters for backups to local disk
"Ryan, Lyle (US)" writes: >The server has an 11TB filesystem to store the backups in. I should >probably be fancier and split this up more, but not now. So I've got my >holding, state, and vtapes directories all in there. In this scenario, I would think there's no point to a "holding" disk. I use a holding disk because my actual backup disk is external-USB and (comparatively) slow. So I backup to a holding disk on my internal SSD, releasing the client and the network as soon as possible, and then copy the backup to the backup drive afterwards. But in your case, I don't see any benefit. (But I'm certainly not an expert, so if somebody contradicts me, then follow their advice.) >And one bonus question: I'm assuming Amanda will just make vtapes as >necessary, but is there any guidance as to how many vtape slots I should >create ahead of time? If my dumpcycle=14, maybe create 14 slots just to >make tapes easier to find? If my memory is correct (I set mine up a long time ago), you would be better off just letting Amanda do what it wants/needs. That way, you don't have to worry about permissions / naming conventions / etc. --hymie!
Re: some suggested config parameters for backups to local disk
On 2018-03-22 19:03, Ryan, Lyle (US) wrote: I've got an Amanda 3.4.5 server running on Centos 7 now, and am able to do rudimentary backups of a remote client. But in spite of reading man pages, HowTo's, etc, I need help choosing config params. I don't mind continuing to read and experiment, but if someone could get me at least in the ballpark, I'd really appreciate it. The server has an 11TB filesystem to store the backups in. I should probably be fancier and split this up more, but not now. So I've got my holding, state, and vtapes directories all in there. The main client I want to back up has 4TB I want to backup. It's almost all in one filesystem, but the HowTo for splitting DLE's with exclude lists is clear, so it should be easy to split this into (say) 10 smaller individual dumps. The bulk of the data is pretty static, maybe 10%/month changes. It's hard to imagine 20%/month changing. For a start, I'd like to get a full done every 2 weeks, and incrementals/differentials on the intervening days. If I have room to keep 2 fulls (2 complete dumpcycles) that would be great. Given what you've said, you should have enough room to do so, but only if you use compression. Assuming the rate of change you quote above s approximately constant and doesn't result in bumping to a level higher than 1, then without compression you will need roughly 4.015TB per cycle (4TB for the full backup, ~15.38GB for the incrementals (roughly 0.38% change per day for 13 days)), plus 4TB of space for the holding disk (because you have to have room for a full backup _there_ prior to taping anything). With compression and assuming you get a compression ratio of about 50%, you should actually be able to fit four complete cycles (you would need about 2.0075TB per cycle), though if you decide you want that I would bump the tapecycle to 60 and the number of slots to 60. So I'm thinking: - dumpcycle = 14 - runspercycle = 0 (default) - tapecycle = 30 - runtapes = 1 (default) I'd break the filesystem into 10 pieces, so 400GB each. and make the vtapes 400GB each (with tapetype length) relying on server-side compression to make it fit. The HowTo "Use pigz to speed compression" looks clear, and the DL380 G7 isn't doing anything else, so server-side compression sounds good. Any advice on this or better ideas? Maybe I'm off in left-field. And one bonus question: I'm assuming Amanda will just make vtapes as necessary, but is there any guidance as to how many vtape slots I should create ahead of time? If my dumpcycle=14, maybe create 14 slots just to make tapes easier to find? Debra covered the requirements for vtapes, slots, and everything very well in her reply, so I won't repeat any of that here. I do however have some other more generic advice I can give based on my own experience: * Make your vtapes as large as possible. They won't take up any space beyond what's stored on them (in storage terminology, they're thinly provisioned), so their total 'virtual' size can be far more than your actual storage capacity, but if you can make it so that you can always fit a full backup on a single vtape, it will make figuring out how many vtapes you need easier, and additionally give a slight boost to taping performance (because the taper never has to stop to switch to a new vtape). In your case, I'd say stating 5TB for your vtape size is reasonable, that would give you some extra room if you suddenly have more data without being insanely over-sized. * Make sure to set a reasonable part_size for your vtapes. While you wouldn't have to worry about splitting dumps if you take my above advice about vtape size, using parts has some other performance related advantages. I normally use 1G, but all of my dumps are less than 100G in size. In your case, if you'll have 10 400G dumps, I'd probably go for 4G for the part size. * Match your holding disk chunk size to your vtape's part_size. I have no hard number to back this up, but it appears to provide a slight performance improvement while dumping data. * Don't worry right now about parallelizing the taping process. It's somewhat complicated to get it working right, significantly changes how you have to calculate vtape slots and sizes, and will probably not provide much benefit unless you're taping to a really fast RAID array that does a very good job of handling parallel writes. * There's essentially zero performance benefit to having your holding disk on a separate partition from your final storage unless you have it on a completely separate disk. There are some benefits in terms of reliability, but realizing them requires some significant planning (you have to figure out exactly what amount of space your holding disk will need). * If you're indexing the backups, store the working index directory (the one Amanda actually reads and writes to) on a separate drive from the holding disk and final backup