https://bugzilla.wikimedia.org/show_bug.cgi?id=18333





--- Comment #1 from [email protected]  2009-04-04 00:35:37 UTC ---
The above will stop new duplication, and would be a shame not to implement.

But what about all the years and years of current duplication already
existing in one's text table?

Should there be a program in maintenance/ available to squeeze it out?

Should it also be run by update.php? Or just once in a wiki's
lifetime? Or just by interested parties who feel the need?

That program would squeeze out duplicates by:
{for each page {go down its list of revisions making duplicate
pointers point to their first}}, the run purgeOldText.php.

One needn't go to "SHA1 mapping to unbloat the text table" (
http://lists.wikimedia.org/pipermail/wikitech-l/2009-March/042373.html
) extremes.

However, perhaps we needn't restrict our thinking to a per article
paradigm, but instead just consider the whole revisions->text table
mapping. Maybe that would be a simpler and smarter way to do this.
We would thus only involved two tables... (wait, we must consider all
tables that have any mapping to the text table! Also all this must be
done with the wiki locked probably, though it would probably only take
few seconds for a small wiki.)

E.g., running our shell/perl scripts above we find 279 separate
pointers to blank (0 byte, vandalism) article revisions. These could
all be made to point to a single text row, even though they are not of
the same article.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to