Nick,

I’m working with many small databases and I haven’t noticed any problems with 
the default rsync behavior (I use options -avHS for archive mode, verbose, 
preserve hardlinks, and sparse file efficient). I have a couple of large 
databases that are also backed up this way and I haven’t noticed any 
performance problems.

Whatever option you choose, I think it’s really important to choose a 
simple/primitive non-CouchDB backup solution in case your specialized user of 
CouchDB has unforeseen side effects. Using a hosted service that is isolated 
from your current hosting solution is also important for the same reasons. Be 
conservative when it comes to safeguarding data.

— 
Paul Okstad
http://pokstad.com

> On Mar 10, 2016, at 2:23 PM, Nick Wood <[email protected]> wrote:
> 
> Thanks for the suggestions.
> 
> @Paul, I did some basic testing with rsync. It seems like the checksuming
> isn't super efficient if I'm trying to maintain up-to-the-minute backups of
> large databases (> 1GB) because the whole file has to be read to checksum
> it. I could just append without checksuming but that doesn't feel safe. Can
> I ask if you're having rsync do checksums/verifying or if you're just
> appending and if you've ever had data corruption issues?
> 
> @Jan, giving that a shot. So far so good. The backups aren't quite as
> current as they would be if continuous replications would work, but this
> seems more efficient than rsync. Might have a winner if the crashes stay
> away.
> 
>  Nick
> 
> On Thu, Mar 10, 2016 at 1:18 AM, Jan Lehnardt <[email protected]> wrote:
> 
>> 
>>> On 09 Mar 2016, at 21:29, Nick Wood <[email protected]> wrote:
>>> 
>>> Hello,
>>> 
>>> I'm looking to back up a CouchDB server with multiple databases.
>> Currently
>>> 1,400, but it fluctuates up and down throughout the day as new databases
>>> are added and old ones deleted. ~10% of the databases are written to
>> within
>>> any 5 minute period of time.
>>> 
>>> Goals
>>> - Maintain a continual off-site snapshot of all databases, preferably no
>>> older than a few seconds (or minutes)
>>> - Be efficient with bandwidth (i.e. not copy the whole database file for
>>> every backup run)
>>> 
>>> My current solution watches the global _changes feed and fires up a
>>> continuous replication to an off-site server whenever it sees a change.
>> If
>>> it doesn't see a change from a database for 10 minutes, it kills that
>>> replication. This means I only have ~150 active replications running on
>>> average at any given time.
>> 
>> How about instead of using continuous replications and killing them,
>> use non-continuous replications based on _db_updates? They end
>> automatically and should use fewer resources then.
>> 
>> Best
>> Jan
>> --
>> 
>> 
>> 
>>> 
>>> I thought this was a pretty clever approach, but I can't stabilize it.
>>> Replications hang frequently with crashes in the log file. I haven't yet
>>> tracked down the source of the crashes. I'm running the official 1.6.1
>>> docker image as of yesterday so I don't think it would be an erlang
>> issue.
>>> 
>>> Rather than keep banging my head against these stability issues, I
>> thought
>>> I'd ask to see if anyone else has come up with a clever backup solution
>>> that meets the above goals?
>>> 
>>> Nick
>> 
>> --
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
>> 
>> 

Reply via email to