I spent most of three hours cleaning Wiki pages last night, going through the pages changed on 14th January, then starting on 13th January - then I looked at the volume of change, and realised that it would take a long time to finish the job. I had found and corrected about 120 pages, but there were many more changes on the 13th. So I thought first about what tool support might help me to fix the spam, then realised I should analyse the scale of the problem a bit more first.

There are ~1821 pages on the Wiki, and ~1171 have been updated in January. An Anonymous Coward with IP address 87.248.161.196 changed at least 783 pages (I and others have probably corrected some of his/her work) between 14:18 and 13/01/2006 16:21 on 13th January. That's more than one page every 10 seconds, on average. It's not feasible for normal Wiki users to detect and correct this volume of change by hand.

On the 14th someone posing as the Instiki Importer, but with IP address 82.131.14.155, made a smaller number of changes. I have reversed those (actually now I see I missed one on the 14th, and that there were a couple of changes from that address on the 12th).

The spam I have seen is very uniform in its nature. Scanning for its signature and automatically rolling back the changes would be easy on the server side - it's much slower and more laborious from the client. I have been tending to edit rather than roll back, as earlier versions turned out to contain spam in a large number of cases. Editing requires a little care - the div containing the spam links is usually right at the end of the useful content, but sometimes it isn't, and sometimes it's truncated. Some have !OK! in front of the div, and some don't.

In normal use, the Wiki appears to get of the order of 20 changes a day, from a variety of users. This is hard to see among the spam-adding and spam-removing traffic. Once the Wiki is clean, it might be reasonable to introduce a limit on the number of pages a given user could create or edit per hour. More intensive use might require privileges of some kind.

Apart from the spam, the Wiki seems fragile and crude. For example the LighttpdConfig page was causing a Rails Application Error until I rearranged the <pre> and <code> tags to nest properly - and to get to an Edit page required manually typing in the URL to create a new version. The RailsAcademy, and the Tutorial pages are in a similar state). Page names and content don't seem to be properly escaped (scroll down the All Pages list in IE to see what I mean). "Back in time" displays two copies of the earlier version. And the facilities of the wiki (search, xref, diff etc.) are weak compared with others. I don't think it does Rails any credit to depend on such poor quality supporting tools - it just appears to be an extreme example of NIH.

regards

  Justin
_______________________________________________
Rails-core mailing list
Rails-core@lists.rubyonrails.org
http://lists.rubyonrails.org/mailman/listinfo/rails-core

Reply via email to