https://bugzilla.wikimedia.org/show_bug.cgi?id=18104

--- Comment #7 from FT2 <ft2.w...@gmail.com> 2010-02-05 09:42:08 UTC ---
OverlordQ and I took a look at this on the toolserver. Some of this may be
obvious or well known - I don't know how much MediaWiki stuff from 2005 would
be "common knowledge".

The latest enwiki deleted revision with no rev_id is timestamped 20050627053602
(June 27 2005, 5.36 am) as Aaron and Graham say. 511728 deleted revisions have
no rev_id. 

(Around 2356 revisions also have an entry with the same rev_id in both current
and deleted revisions tables. This is presumably due to old data slippage.)

Deleted revisions from before June 2005 which were restored apparently got
allocated a new rev_id. Eg, compare the dates for enwiki revision id's 15700000
(June 14 2005), 15700001 (June 9 2005), 16300000 (May 1 2005), and 17000001
(Dec 8 2004). It doesn't seem to cause problems though.

There appear to have been around ~ 17,739,500 revisions on enwiki prior to the
changeover of June 2005. Because rev_ids were reallocated you have to go back
and forth by 50 or 100 at a time to get an idea what rev__id was reached at
roughly what sort of time. It turns out that there were about 17.74 m enwiki
revisions at the changeover.

Oversight and Developer deletions would have been negligible up to that point.
So in principle, there were approximately 17.7 m revisions prior to the
changeover of which 17,043,322 can be traced to "Live data", leaving 696k
revision ids untraced.

The conclusion is that the 511 k of old deleted revisions with rev_id = NULL
can be sequenced into the 17.7 m known rev_ids prior to the changeover, and
there are 696 k rev_ids of deleted revisions which they map into. (The
explanation for the other 185 k isn't clear. Delete/restore activity on old
revisions??) 

It looks like all the deleted revisions with a null value can be matched fairly
accurately by time order against existing gaps in the current revisions and
assigned a suitable rev_id that's currently not taken. It might not be perfect
but it'll be close, and allocating a time-sequenced old rev_id is probably
helpful for admins and the like.

Deletions are quite interspersed with undeleted revisions so this isn't a job
requiring human guesswork. It could be a once-off task suited to a script.

This would at least mean every deleted revision had a rev_id, which is a first
step in fixing the problems.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are watching all bug changes.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to