[Pywikipedia-l] replacing based on two regexes not near each other

Michael Hamm Wed, 01 May 2013 23:41:07 -0700

Hi,

Each ns:0 page on English Wiktionary is divided into a bunch of
sections headed by level-2 headers.  The text of each level-2 header
is the name of a language; e.g., ==English==.


I use (something like) the following JavaScript when editing pages:

txt =
  txt.replace
  ( /^==([a-zA-Z ]+)==\n+(?:(?:===|[^=]).*\n+)*/gm,
    function(section, langname)
    { return '' +
        section.replace
        ( /(\{\{homophones\|)([^=}]*\}\})/gm,
          '$1lang={'+'{subst:langrev|'+langname+'}}|$2'
        );
    }
  );

This searches for {{homophones|...}} without a lang= parameter and
adds the lang= parameter appropriate for the ==section== in which
{{homophones|...}} appears.  This works.

I want to automate this, so wish to use pywikipediabot.  So I've
translated the above into Python as best I could, and come up with the
following user-fixes.py :

def homophix(match):
    return re.sub(r'(\{\{homophones\|)([^}=]*\}\})',
                  r'\1lang={{subst:langrev|'+re.escape(match.group(1))+r'}}|\2',
                  match.group(0)
                  )

fixes['homophones'] = {
    'regex': True,
    'msg': {'_default':u'add lang to homophones'},
    'replacements': [
        (ur'^==([a-zA-Z ]+)==\n+(?:(?:===|[^=]).*\n+)*', homophix)
    ]
}

...which I then tried to call using
python replace.py -fix:homophones -page:accapare


(Note that [[wikt:en:accapare]] has {{homophones|...}} without = .)

Python told me:
No changes were necessary in [[accapare]]
0 pages were changed.

So I guess it's either not matching or not replacing.

What am I doing wrong?

And what can I do instead?

Thanks,

Michael Hamm

_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

[Pywikipedia-l] replacing based on two regexes not near each other

Reply via email to