On Mon, Apr 12, 2010 at 18:05, Merlijn van Deen <[email protected]>wrote:
> Searching by using a text dump sounds more reasonable to me.
How would you do this?
E.g. I want to create a list all pages with tables - i.e. with the string
"{|". The MediaWiki search won't do this, but I assume it's possible with a
site dump. But I don't know the command to use.
Thanks
If you insist on changing replace.py, make sure you are removing all
> occurences of both put and put_async.
>
> Best regards,
> Merlijn 'valhallasw' van Deen
>
>
> On 12 April 2010 09:54, Chris Watkins <[email protected]>wrote:
>
>> So I haven't found a way to make a list of matches without replacing. I
>> suspect there's a very simple way, or it would take very simple changes to
>> replace.py.
>>
>>
>> I tried editing replace.py myself, to make it do everything except replace
>> the files. Then I could hack the log files to get the list I want. But I had
>> no success - I'm not coder, so it was guesswork.
>>
>> I copied replace.py to a new file intended to do everything except put
>> files, and called it *replacenoput.py* (i.e. "replace," but no "put")
>>
>> My first attempt was to remove this section (commented it out first, but
>> then removed to be sure):
>>
>> if self.acceptall and new_text != original_text:
>> try:
>> page.put(new_text, self.editSummary)
>> except wikipedia.EditConflict:
>> wikipedia.output(u'Skipping %s because of edit
>> conflict'
>> % (page.title(),))
>> except wikipedia.SpamfilterError, e:
>> wikipedia.output(
>> u'Cannot change %s because of blacklist entry %s'
>> % (page.title(), e.url))
>> except wikipedia.PageNotSaved, error:
>> wikipedia.output(u'Error putting page: %s'
>> % (error.args,))
>> except wikipedia.LockedPage:
>> wikipedia.output(u'Skipping %s (locked page)'
>> % (page.title(),))
>>
>>
>> Fail - it made the changes all the same.
>>
>> Then I figured out that wikipedia.py was being used to put the files. So I
>> copied that to a new file *wikipedianoput.py* and changed every wikipedia
>> reference in *replacenoput.py* to wikipedianoput.
>>
>> Then I scanned through wikipedianoput.py looking for what I need to
>> block... but I couldn't tell.
>>
>> Can anyone help? Or even better, is there a more elegant way?
>>
>> Thanks
>> Chris
>>
>>
>> On Fri, Apr 2, 2010 at 00:12, Daniel Mietchen <
>> [email protected]> wrote:
>>
>>> Hi Chris,
>>>
>>> On Thu, Apr 1, 2010 at 2:26 PM, Chris Watkins
>>> <[email protected]> wrote:
>>> > Thanks Daniel... I'm confused though.
>>> >
>>> > On Thu, Apr 1, 2010 at 20:25, Daniel Mietchen
>>> > <[email protected]> wrote:
>>> >>
>>> >> Perhaps
>>> >> http://meta.wikimedia.org/wiki/Pywikipediabot/copyright.py
>>> >> will do the trick,
>>> >
>>> > I can't see how to use it for matching a specific string.
>>> Nor do I - sorry. What I had in mind was to apply it to a page that
>>> contains your search string, and to restrict the search for "copyright
>>> violations" to your site.
>>> But this may indeed be a dead end.
>>>
>>> >> or simply
>>> >> http://meta.wikimedia.org/wiki/Pywikipediabot/replace.py
>>> >> in -debug mode?
>>> >
>>> > Where can I find information on -debug mode? I see there is -verbose
>>> mode
>>> > which "may be helpful when debugging", but I don't see how that helps.
>>> I thought that most PWB scripts had it, but apparently replace.py does
>>> not.
>>>
>>> but if the
>>> def __init__(self, reader, force, append, summary, minor, autosummary,
>>> debug):
>>> line contains "debug" (as in the example above, taken from
>>>
>>> http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/pagefromfile.py?view=markup
>>> ),
>>> then -debug is an option with which the script can be run such that it
>>> performs all its
>>> actions except editing the pages.
>>>
>>> I am not very experienced with Python or PWB either, but since nobody
>>> had replied so far, I wrote out my ideas as they came to mind.
>>> Sorry for the confusion,
>>>
>>> Daniel
>>>
>>> > I may be missing something obvious &-)
>>> Me too.
>>>
>>> > Chris
>>> >
>>> >
>>> >>
>>> >> Daniel
>>> >>
>>> >> On Thu, Apr 1, 2010 at 6:05 AM, Chris Watkins
>>> >> <[email protected]> wrote:
>>> >> > I want to generate a list of matches for a search, but not do
>>> anything
>>> >> > to
>>> >> > the page.
>>> >> >
>>> >> > E.g. I want to list all pages that contain "redirect[[:Category",
>>> but I
>>> >> > don't want to modify the pages.
>>> >> >
>>> >> > I guess that it's possible to modify redirect.py (I don't speak
>>> python,
>>> >> > but
>>> >> > it shouldn't be hard) and run it with -log. But maybe there's a
>>> simpler
>>> >> > way?
>>> >> >
>>> >> > Thanks in advance.
>>> >> >
>>> >> > --
>>> >> > Chris Watkins
>>> >> >
>>> >> > Appropedia.org - Sharing knowledge to build rich, sustainable lives.
>>> >> >
>>> >> > blogs.appropedia.org
>>> >> > community.livejournal.com/appropedia
>>> >> > identi.ca/appropedia
>>> >> > twitter.com/appropedia
>>> >> >
>>> >> > _______________________________________________
>>> >> > Pywikipedia-l mailing list
>>> >> > [email protected]
>>> >> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> http://www.google.com/profiles/daniel.mietchen
>>> >>
>>> >> _______________________________________________
>>> >> Pywikipedia-l mailing list
>>> >> [email protected]
>>> >> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>> >
>>> >
>>> >
>>> > --
>>> > Chris Watkins
>>> >
>>> > Appropedia.org - Sharing knowledge to build rich, sustainable lives.
>>> >
>>> > blogs.appropedia.org
>>> > community.livejournal.com/appropedia
>>> > identi.ca/appropedia
>>> > twitter.com/appropedia
>>> >
>>> > _______________________________________________
>>> > Pywikipedia-l mailing list
>>> > [email protected]
>>> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> http://www.google.com/profiles/daniel.mietchen
>>>
>>> _______________________________________________
>>> Pywikipedia-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>>
>>
>>
>>
>> --
>> Chris Watkins
>>
>> Appropedia.org - Sharing knowledge to build rich, sustainable lives.
>>
>> blogs.appropedia.org
>> community.livejournal.com/appropedia
>> identi.ca/appropedia
>> twitter.com/appropedia
>>
>> _______________________________________________
>> Pywikipedia-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>
>>
>
> _______________________________________________
> Pywikipedia-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
>
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org
community.livejournal.com/appropedia
identi.ca/appropedia
twitter.com/appropedia
_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l