2012/3/18 Chris Watkins <[email protected]>

>
> There are a couple of options then. If the first solution, from Bináris,
> requires the page to be identified, I could make a list from AllPages for
> that namespace...
>
Yes, and once you clean it up properly, you may repeat the cleaning
regularly, e.g. on a weekly basis, from Recentchanges which is much faster.

>
> But for now (given my lack of skill in SQL and Python)

I tell you how I became a Python programmer. I behan to use Pywikipedia,
then I realized that I wanted to modify something for my own needs, and
tried to understand, then I began to exepriment with basic.py, than I began
to write my own scripts... Now I use Python as my general hobby programming
language.


> it occurs to me that I can do a search for any match from a list of spam
> strings and replace with a delete tag.  "(Florida|real estate|home
> insurance... )"  - I have a list of a few hundred spammy phrases.

That's a way, too, if your list is comprehensive enough. I strongly suggest
you to use fixes instead of command line replacements. You may create a fix
in fixes.py or user-fixes.py with all your stopwords, following the same
pattern, while you can't type a few hundred words into a command. If your
list is in a well ordered form, you may put the words in a column of an
Excel table, and create the replacements with a text function and copy back
to user-fixes.py.
You may also want to replace these words with a special category instead of
a deleet tag and tell delete.py to kill them en masse.

One more question: do you know blacklist? This is an admins' tool to list
ugly websites. You just write the bad-faced website on the blacklist, and
users will be unhappy to see  they  cannot save the page until it contains
the link to it. :-)


-- 
Bináris
_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Reply via email to