I will post a full response in the next few days when I get a chance, I am away at the moment.
git is quite a fast version control system, but it raises its own issues. As it is distributed there is no method of locking files like you can with SVN, this could create a number of issues when multiple people commit the same files. I am yet to find a version control system that handles binary files in a very elegant manner. The approach that I am quite liking the idea of for larger project is producing branches with the appropriate files as they are needed and then checking out that branch so that you only get the files you need to work on. When you are done working the branch is merged back into trunk. If you want to avoid conflicts you could lock the files on trunk. In this case you may then want a network share that always has a checkout of the the up-to-date trunk. Regards, Zac On Mon, Jan 12, 2009 at 5:42 PM, Chadrik <[email protected]> wrote: > > > i spent some time over the past few weeks researching various open- > source version control apps for use in vfx. thought i'd throw you all > an update with my findings. as i explored different options and > thought about the big picture, i came up with some features that i > considered necessary and/or preferable. > > ---prerequisites--- > free or very cheap ( perforce is $900/user x 100 users = $90,000 = non- > option ) > cross platform > python api > fast performance with binary files > configurable to conserve disk space > - ability to easily remove unneeded files from repo (aka > 'obliterate') > - limited file redundancy > > ---bonus--- > no recursive special directories ( like .svn directories ) > > much of the prereq's are based around the notion that we'll be dealing > with some very large files. we want to avoid replicating them all > over our server because redundancy is a waste of disk space, network > traffic, and copy time. > > so, what were my conclusions? subversion simply won't work. here's > why: > while subversion's python api seems quite top notch, subversion itself > fails pretty miserably when it comes to binary performance and disk > space usage. it stores all files in the repo using a delta algorithm, > meaning each file is stored not as a whole file, but as the difference > between itself and the previous commit. this has the advantage of > saving disk space and of always having the diff on hand. however, > calculating a delta for many large binary files -- and then later > merging deltas to reform complete files -- takes prohibitively (read: > insanely) long. take a look at this article for some performance tips > and figures: > http://www.ibm.com/developerworks/java/library/j-svnbins.html. > unfortunately, their solution is to use svn's import and export > commands, which store and retrieve binary files whole and > uncompressed. the problem is that you don't get any version control > on those files, so what's the bloody point? > > the second major failing is disk space usage. the delta algorithm > saves space, but that space savings is far outweighed by several > failings. first of all, every file you check out is stored twice. > yep, EVERY file. in addition to your working copy it keeps an extra > copy in the .svn directory so that IF you edit the file you can do a > quick, offline diff. there's no way to turn off this "feature". so, > if you're checking out 500GB of data, it's gonna be more like 1GB. > all that extra disk space used up in every working copy is almost no > benefit, because diff's between binary files are useless without a > custom app to interpret the data. last in the disk space category, if > a user accidentally checks in 100GB of cache data, or lets say, you're > repo is getting very large and you want to wipe out some old versions > of an asset that you know aren't being used, you cannot do so without > going through some extreme pain. you have to use `svnadmin dump` to > dump your entire repo to a text file, then use dumpfillter to filter > through your data and remove what you don't want, then rebuild your > repo. this process can take many hours if your repo is very large. > > the last part is a pet peev, and that's the recursive .svn > directories. these are annoying to deal with because if you decide to > switch out some directories in your working copy with some others of > the same name and you expect it to simply use the new ones in their > place, it won't work. you have to copy over all the .svn folders from > the original into the new set. imagine how well this will work with > artists! you would have to write scripts for moving and modifying > these .svn directories and the artists would have to reliably use them > instead of just dragging and dropping directories or the system would > break down. > > i was pretty disappointed to finally come to this conclusion about > subversion, but the fact is that it does what it's mean to do well, > and managing large binary datasets is not what it's meant to do. so, > i moved on and began applying my criteria to pretty much every > revision control system i could find ( using this list: > http://en.wikipedia.org/wiki/Comparison_of_revision_control_software > ). most are cvs/svn derivatives with no real advantage in feature > set. i ran away from anything that used delta compression on binary > files, and at first i shied away from distributed systems because of > what i read in the mercurial manual: > > " Because Subversion doesn't store revision history on the client, it > is well suited to managing projects that deal with lots of large, > opaque binary files. If you check in fifty revisions to an > incompressible 10MB file, Subversion's client-side space usage stays > constant The space used by any distributed SCM will grow rapidly in > proportion to the number of revisions, because the differences between > each revision are large. > " > > essentially, if you have a 500GB repo, then that 500GB is copied to > every working copy. ie: mercurial is worse than subversion with > binary files ( and subversion is already pretty bad with binary > files ). i shouldn't write off mercurial, though, because with the > right features, it still might be viable, because as i shortly > discovered, my favorite option ended up being a distributed system.... > > that system is "git". so far, i think it has the most potential of > anything i've seen. it's distributed, but very flexible and has many > different models for revision control, plus a lot of options to help > save disk space / network traffic. it can even be configured to work > like cvs/svn, if that is your desire. the project was started by > linux torvalds, and as he put it: "It's not an SCM, it's a > distribution and archival mechanism. I bet you could make a > reasonable SCM on top of it, though. Another way of looking at it is > to say that it's really a content-addressable filesystem, used to > track directory trees." ( taken from this helpful site: > http://utsl.gen.nz/talks/git-svn/intro.html ) > > the python api is provided by a 3rd party, which is a bit > disappointing (ironic, coming from the guy who started pymel), but it > exists and looks object-oriented enough. git doesn't use delta- > compression, the amount of history copied from a repo can be limited > or even shared via hard links, it has the ability to prune old > commits, it has an option to pack away commits that are no longer used > into even great compression, and it doesn't use annoying recursive > directories. > > i haven't begun using git in a real-world test yet, but if you're > looking for something to base a pipe on, this could be the horse to > bet on. ultimately, i would really like to start an open-source > asset management project, so take a look at git and see what you > think. i'll let you know as i find out more. i haven't done a speed > test on a large image sequence yet, that could still be a deal- > breaker, but so far it "feels" fast. > > -chad > > > > > > > > > > > --~--~---------~--~----~------------~-------~--~----~ Yours, Maya-Python Club Team. -~----------~----~----~----~------~----~------~--~---
