On Thu, Apr 23, 2009 at 9:29 AM, <[email protected]> wrote: > Quoting Lukasz Szybalski <[email protected]>: > >> I guess the question would be: Could you describe the > type of data you >> currently have. (percentage of space, downloads, changes) >> > > This is the directory that has broken the system (watch > out-- it may break your browser): > > http://ukparse.kforge.net/svn/undata/pdf/ > > It's several thousand large PDFs of UN documents. The > same would apply to scanned images, archived pages from > Hansard, etc. > > > At the moment I'm storing it in SVN as a means of > distribution, but it unnecessarily doubles the disk > useage, and some of the SVN clients are very unhappy > with the size of the directory.
So for this particular case, is size the problem? 1. I wonder if using distributed repositories would help (bzr, git, hg) (Not sure if that will help with size.) The problem with revision control is that it doesn't track binary files, so every commit or changes puts a new version in there instead of diff. (I think that is the case) 2. If you want to bypass the repository, you could use file system like zfs (instead of ext3) which has few extra features like snapshots etc...(not sure if that would do it)(You could look at the history of snapshots to see previous version) As far as redounded file system with multiple nodes, the only one I know of right now is google file system, but no source available for that.... So set of http/ftp mirrors will need to do, with rsync from the main server. 1tb of space is not that expensive, unless you guys host this in some kind of "server hosting " environment. Lucas > > > SVN is entirely inappropriate for these large binary > files (there are no versions), but it's convenient only > because the code that handles these binary files are in > SVN (where they belong), and the fewer means of > distribution the better. But it's not scaling any more. When you say scaling? You mean? > > > We need a better answer for parking the data for these > projects, where we'd keep the scraping/parsing code in > SVN on kforge (SVN is designed for code), and handle > these large sets of large non-versioned files some other > way. > > > > Julian. > > > _______________________________________________ > okfn-discuss mailing list > [email protected] > http://lists.okfn.org/cgi-bin/mailman/listinfo/okfn-discuss > -- How to create python package? http://lucasmanual.com/mywiki/PythonPaste DataHub - create a package that gets, parses, loads, visualizes data http://lucasmanual.com/mywiki/DataHub _______________________________________________ okfn-discuss mailing list [email protected] http://lists.okfn.org/cgi-bin/mailman/listinfo/okfn-discuss
