-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I do know that heading or section recognition inside the framework was
mostly (e.g. archive bot) done by using regex... I myself felt always
that it is not reliable since there are a lot of odd possible
situations. Thus I wrote an 'getSections' method for DrTrigonBot but
I am not aware if this could be of any use for you...

Anyway feel free to have a look at it and use it if you like... ;)

https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/dtbext/dtbext_wikipedia.py?hb=true#to122

Greetings

On 22.12.2011 09:18, Chris Watkins wrote:
> I just worked it out, mostly... instead of: 
> -exceptinsidetag:header
> 
> I used: -exceptinside:'=[^\n\r]*=[ \t]*'
> 
> And it worked!
> 
> There might be a small risk of false positives, so I tried various 
> tweaks, e.g. -exceptinside:'^=[^\n\r]*=[ \t]*$' 
> -exceptinside:'[\n\r]=[^\n\r]*=[ \t]*[\n\r]' 
> -exceptinside:'[\n\r]=[^\n\r]*='
> 
> But none worked... any suggestions?
> 
> On Thu, Dec 22, 2011 at 18:21, Chris Watkins 
> <[email protected]
> <mailto:[email protected]>> wrote:
> 
> I have been using " -exceptinsidetag:header" with replace.py. This 
> was added by Daniel Herding in response to a request by me:
> 
> On Mon, Jun 30, 2008 at 23:11, Daniel Herding <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> 
> 
> This will exclude wikilinks and URLs. There are some more things 
> that can be excluded, see the source code of the method
> replaceExcept() in wikipedia.py (look at the exceptionRegexes
> dictionary). I have just added a regular expression for section
> headers for you, so if you're running the SVN version, you can use
> this parameter:
> 
> -exceptinsidetag:header
> 
> 
> 
> I seem to recall this working in a nightly version a couple of
> years ago, but it's not working now - I'm not sure when it stopped.
> Is it possible to put it back in?
> 
> Thanks!
> 
> 
> -- Chris Watkins
> 
> Appropedia.org - Sharing knowledge to build rich, sustainable
> lives.
> 
> 
> 
> 
> -- Chris Watkins
> 
> Appropedia.org - Sharing knowledge to build rich, sustainable
> lives.
> 
> 
> 
> _______________________________________________ Pywikipedia-l
> mailing list [email protected] 
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk71ni0ACgkQAXWvBxzBrDBEKQCgwDB6gNylbEgXPxfld1M7sAhL
9XUAoIhYypqoyM3FzUCNSgJ7bT+6QLoj
=yxc+
-----END PGP SIGNATURE-----

_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Reply via email to