Robert Ullmann wrote: > Hi, > > Maybe I should offer a constructive suggestion?
They are better than rants :) > Clearly, trying to do these dumps (particularly "history" dumps) as it > is being done from the servers is proving hard to manage > > I also realize that you can't just put the set of daily > permanent-media backups on line, as they contain lots of user info, > plus deleted and oversighted revs, etc. > > But would it be possible to put each backup disc (before sending one > of the several copies off to its secure storage) in a machine that > would filter all the content into a public file (or files)? Then > someone else could download each disc (i.e. a 10-15 GB chunk of > updates) and sort it into the useful files for general download? I don't think they move backup copies off to secure storage. They have the db replicated and the backup discs would be copies of that same dumps. (Some sysadmin to confirm?) > Then someone can produce a current (for example) English 'pedia XML > file; and with more work the cumulative history files (if we want that > as one file). > > There would be delays, each of your permanent media backup discs has > to be (probably manually, but changers are available) loaded on the > "filter" system, and I don't know how many discs WMF generates per > day. (;-) and then it has to filter all the revision data etc. But it > still would easily be available for others in 48-72 hours, which beats > the present ~6 weeks when the dumps are working. > > No shortage of people with a box or two and any number of Tbyte hard > drives that might be willing to help, if they can get the raw backups. The problem is that WMF can't provide that raw unfiltered information. Perhaps you could donate a box on the condition that it could only be used for dump processing, but giving out unfiltered data would be too risky. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
