Hi, On Mon, Mar 12, 2012 at 12:05 PM, Marcel Reutegger <[email protected]> wrote: >> 2) Assuming we get the clustering architecture right (which we >> should), it should be possible to start a new read-only node to an >> existing cluster, wait for it to synchronize all existing content from >> the rest of the cluster, and finally stop this backup node. The result >> should be a complete, runnable copy of the repository. > > this however also assumes that each cluster node has a complete copy > of the repository. I'd rather be in favor of a cluster solution that > distributes the repository across multiple machines for improved > scalability.
Right, sharding is an added complexity. Once a repository reaches the size where sharding is required, the traditional approach of backing up the repository to a single backup server or tape no longer works. In such cases the backup itself probably also needs to be sharded, which probably is best achieved by starting a full set of shards instead of just a single node for the backup. BR, Jukka Zitting
