Thought I'd point out a couple of useful things I've come across when doing regex work (in Python, but also in other languages):
1: The re.VERBOSE flag. Lets you write your regular expressions using multiline strings (you'll have to escape whitespace, or use \s though), and also add comments. Makes it a lot easier to understand what you've been thinking when you come back to your code two months later to change it. 2: Using functions instead of strings as the replacement in sub(). If you're looking to do a fair amount of conditional logic in your replacement, it might be more easily written by having a function do it, rather than attempt to do it all with a regex. My $.02. Cheers, Morten On Tue, Jun 28, 2011 at 7:23 AM, Bináris <[email protected]> wrote: > OK, then I make separate lines. The only issue is that any > enhacement/correction will be more complicated this way (that is another > reason to put as many features in one line as possible). > > > 2011/6/28 Marcin Cieslak <[email protected]> > >> >> Given the speed of fetching/storing pages I don't think that speed of the >> regular expression makes any difference. Running two compiled RE's >> one after the other in sequence on the page text should be very fast. >> >> >> > -- > Bináris > > _______________________________________________ > Pywikipedia-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l > >
_______________________________________________ Pywikipedia-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
