I spent most of three hours cleaning Wiki pages last night, going
through the pages changed on 14th January, then starting on 13th January
- then I looked at the volume of change, and realised that it would take
a long time to finish the job. I had found and corrected about 120
pages, but there were many more changes on the 13th. So I thought first
about what tool support might help me to fix the spam, then realised I
should analyse the scale of the problem a bit more first.
There are ~1821 pages on the Wiki, and ~1171 have been updated in
January. An Anonymous Coward with IP address 87.248.161.196 changed at
least 783 pages (I and others have probably corrected some of his/her
work) between 14:18 and 13/01/2006 16:21 on 13th January. That's more
than one page every 10 seconds, on average. It's not feasible for normal
Wiki users to detect and correct this volume of change by hand.
On the 14th someone posing as the Instiki Importer, but with IP address
82.131.14.155, made a smaller number of changes. I have reversed those
(actually now I see I missed one on the 14th, and that there were a
couple of changes from that address on the 12th).
The spam I have seen is very uniform in its nature. Scanning for its
signature and automatically rolling back the changes would be easy on
the server side - it's much slower and more laborious from the client. I
have been tending to edit rather than roll back, as earlier versions
turned out to contain spam in a large number of cases. Editing requires
a little care - the div containing the spam links is usually right at
the end of the useful content, but sometimes it isn't, and sometimes
it's truncated. Some have !OK! in front of the div, and some don't.
In normal use, the Wiki appears to get of the order of 20 changes a day,
from a variety of users. This is hard to see among the spam-adding and
spam-removing traffic. Once the Wiki is clean, it might be reasonable to
introduce a limit on the number of pages a given user could create or
edit per hour. More intensive use might require privileges of some kind.
Apart from the spam, the Wiki seems fragile and crude. For example the
LighttpdConfig page was causing a Rails Application Error until I
rearranged the <pre> and <code> tags to nest properly - and to get to an
Edit page required manually typing in the URL to create a new version.
The RailsAcademy, and the Tutorial pages are in a similar state).
Page names and content don't seem to be properly escaped (scroll down
the All Pages list in IE to see what I mean). "Back in time" displays
two copies of the earlier version. And the facilities of the wiki
(search, xref, diff etc.) are weak compared with others. I don't think
it does Rails any credit to depend on such poor quality supporting tools
- it just appears to be an extreme example of NIH.
regards
Justin
_______________________________________________
Rails-core mailing list
Rails-core@lists.rubyonrails.org
http://lists.rubyonrails.org/mailman/listinfo/rails-core