On Sat, Dec 24, 2011 at 20:41, Dr. Trigon <[email protected]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I do know that heading or section recognition inside the framework was
> mostly (e.g. archive bot) done by using regex... I myself felt always
> that it is not reliable since there are a lot of odd possible
> situations.


That's true - the regex solution that I gave works sometimes, but sometimes
it still matches inside headers. Don't know why - haven't debugged it yet.


> Thus I wrote an 'getSections' method for DrTrigonBot but
> I am not aware if this could be of any use for you...
>
> Anyway feel free to have a look at it and use it if you like... ;)
>
>
> https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/dtbext/dtbext_wikipedia.py?hb=true#to122
>

Hmm... it's above me, as I don't speak Python. Not sure how to use it. :-(

Thanks anyway!
Chris


> Greetings
>
> On 22.12.2011 09:18, Chris Watkins wrote:
> > I just worked it out, mostly... instead of:
> > -exceptinsidetag:header
> >
> > I used: -exceptinside:'=[^\n\r]*=[ \t]*'
> >
> > And it worked!
> >
> > There might be a small risk of false positives, so I tried various
> > tweaks, e.g. -exceptinside:'^=[^\n\r]*=[ \t]*$'
> > -exceptinside:'[\n\r]=[^\n\r]*=[ \t]*[\n\r]'
> > -exceptinside:'[\n\r]=[^\n\r]*='
> >
> > But none worked... any suggestions?
> >
> > On Thu, Dec 22, 2011 at 18:21, Chris Watkins
> > <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > I have been using " -exceptinsidetag:header" with replace.py. This
> > was added by Daniel Herding in response to a request by me:
> >
> > On Mon, Jun 30, 2008 at 23:11, Daniel Herding <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >
> >
> > This will exclude wikilinks and URLs. There are some more things
> > that can be excluded, see the source code of the method
> > replaceExcept() in wikipedia.py (look at the exceptionRegexes
> > dictionary). I have just added a regular expression for section
> > headers for you, so if you're running the SVN version, you can use
> > this parameter:
> >
> > -exceptinsidetag:header
> >
> >
> >
> > I seem to recall this working in a nightly version a couple of
> > years ago, but it's not working now - I'm not sure when it stopped.
> > Is it possible to put it back in?
> >
> > Thanks!
> >
> >
> > -- Chris Watkins
> >
> > Appropedia.org - Sharing knowledge to build rich, sustainable
> > lives.
> >
> >
> >
> >
> > -- Chris Watkins
> >
> > Appropedia.org - Sharing knowledge to build rich, sustainable
> > lives.
> >
> >
> >
> > _______________________________________________ Pywikipedia-l
> > mailing list [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk71ni0ACgkQAXWvBxzBrDBEKQCgwDB6gNylbEgXPxfld1M7sAhL
> 9XUAoIhYypqoyM3FzUCNSgJ7bT+6QLoj
> =yxc+
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Pywikipedia-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>



-- 
Chris Watkins

Appropedia.org - Sharing knowledge to build rich, sustainable lives.
_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Reply via email to