Robert Ullmann wrote:
> Hi,
> 
> Maybe I should offer a constructive suggestion?

They are better than rants :)

> Clearly, trying to do these dumps (particularly "history" dumps) as it
> is being done from the servers is proving hard to manage
> 
> I also realize that you can't just put the set of daily
> permanent-media backups on line, as they contain lots of user info,
> plus deleted and oversighted revs, etc.
> 
> But would it be possible to put each backup disc (before sending one
> of the several copies off to its secure storage) in a machine that
> would filter all the content into a public file (or files)? Then
> someone else could download each disc (i.e. a 10-15 GB chunk of
> updates) and sort it into the useful files for general download?

I don't think they move backup copies off to secure storage. They have
the db replicated and the backup discs would be copies of that same
dumps. (Some sysadmin to confirm?)

> Then someone can produce a current (for example) English 'pedia XML
> file; and with more work the cumulative history files (if we want that
> as one file).
> 
> There would be delays, each of your permanent media backup discs has
> to be (probably manually, but changers are available) loaded on the
> "filter" system, and I don't know how many discs WMF generates per
> day. (;-) and then it has to filter all the revision data etc. But it
> still would easily be available for others in 48-72 hours, which beats
> the present ~6 weeks when the dumps are working.
> 
> No shortage of people with a box or two and any number of Tbyte hard
> drives that might be willing to help, if they can get the raw backups.

The problem is that WMF can't provide that raw unfiltered information.
Perhaps you could donate a box on the condition that it could only be
used for dump processing, but giving out unfiltered data would be too risky.



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to