On Thu, Jun 7, 2018 at 3:04 AM Stefan Sperling <s...@elego.de> wrote: > > On Wed, Jun 06, 2018 at 03:12:20PM -0400, Alfred von Campe wrote: > > I’m trying to remove two sensitive directories from a repo so we can have a > > 3rd party work on it. I first dumped the entire repo, and now I’m trying > > to remove two directories from one particular branch. But svndumpfilter > > keeps failing as follows: > > > > $ svndumpfilter exclude branches/develop/dir1 branches/develop/dir2 < > > repo.dump > repo-nodir12.dump > > svndumpfilter: E200003: Invalid copy source path '/branches/develop/dir2' > > > > I’ve tried this both from a full incremental dump of the repo as well as a > > non-incremental dump of the repo starting from the revision that > > branches/develop was created. It always fails after the exact same > > revision. > > > > Is there anything I can do to work around this issue? > > > > Alfred > > Yes, you can update to 1.10 and use svnadmin dump --exclude > instead of using svndumpfilter. > See > http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude > > An alternative that works with earlier releases is to set up svnsync > replication and configure authz access rules for the sync user which > forbid read access to the paths you want to exclude. svnsync will deal > with missing copy sources by translating copies into additions.
There is also a fairly nasty and somewhat hazardous trick I've used effectively a few times to clean up a historically messy SVN layout. Import it to git with git svn, trim debris branches and tags and out-of-band content ruthlessly, use "git gc --aggressive" to flush loose objects or branches *from the history*, then export that with git svn into a new Subversion repository. There are risks: git doesn't handle keywords the same way Subversion does, for example, so the transfer needs to be reviewed cautiously for svn:keywords and svn:ignore and svn:eol handling. But when you've a messy Subversion layout where people dumped oddly named branches or parts of branches in weird locations, or embedded bulky binary files accidentally and left copies scattered around the history, it can be an invaluable cleanup tool. It also doesn't require access to the Subversion server to run "svnadmin dump", and it can be updated from the current running Subversion master. Part of the key is the use of the "git gc --aggressive" tool to flush history of pruned content. Yes, this flushes history, and is considered a sin, Sin, ***SIN*** for those who consider a complete and pristine history of the entire source tree the whole point of a source control system. But in practice..... most branches and tags are pointless after long enough. and it only takes a few accidental commits of bulky binaries or of inappropriately imported content to clutter and even legally encumber a source control system. Like pruning any history, it needs to be done cautiously or important material can be lost..