Anthony the process is linear, you have a php inserting X number of rows per Y time frame. Yes rebuilding the externallinks, links, and langlinks tables will take some additional time and wont scale. However I have been working with the toolserver since 2007 and Ive lost count of the number of times that the TS has needed to re-import a cluster, (s1-s7) and even enwiki can be done in a semi-reasonable timeframe. The WMF actually compresses all text blobs not just old versions. complete download and decompression of simple only took 20 minutes on my 2 year old consumer grade laptop with a standard home cable internet connection, same download on the toolserver (minus decompression) was 88s. Yeah Importing will take a little longer but shouldnt be that big of a deal. There will also be some need cleanup tasks. However the main issue, archiving and restoring wmf wikis isnt an issue, and with moderately recent hardware is no big deal. Im putting my money where my mouth is, and getting actual valid stats and figures. Yes it may not be an exactly 1:1 ratio when scaling up, however given the basics of how importing a dump functions it should remain close to the same ratio
On Thu, May 17, 2012 at 12:54 AM, Anthony <wikim...@inbox.org> wrote: > On Thu, May 17, 2012 at 12:45 AM, John <phoenixoverr...@gmail.com> wrote: > > Simple.wikipedia is nothing like en.wikipedia I care to dispute that > > statement, All WMF wikis are setup basically the same (an odd extension > here > > or there is different, and different namespace names at times) but for > the > > purpose of recovery simplewiki_p is a very standard example. this issue > isnt > > just about enwiki_p but *all* wmf wikis. Doing a data recovery for > enwiki vs > > simplewiki is just a matter of time, for enwiki a 5 day estimate would be > > fairly standard (depending on server setup) and lower times for smaller > > databases. typically you can explain it in a rate of X revisions > processed > > per Y time unit, regardless of the project. and that rate should be > similar > > for everything given the same hardware setup. > > Are you compressing old revisions, or not? Does the WMF database > compress old revisions, or not? > > In any case, I'm sorry, a 20 gig mysql database does not scale > linearly to a 20 terabyte mysql database. > _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l