unsure if this is the right place to voice my thoughts on such a thing, but given the idealism of python (particularly as an anti-thesis to much of the ideas of perl), after trying to fix a broken perl script late at night, It occurred to me that regular expressions are somewhat un-pythonic. I actually find the python 're' module, although more versatile than regular expressions in perl, something that I always have to refer to the manual for, in spite of the number of times I've used it. In other words, I'm tempted to stretch our beloved term "unpythonic" to regular expressions. This is rare for a small python module.
So I thought it's time to start something new, perhaps as a python module. I've googled around to see if there's any attempts at an alternative out there, and found nothing, although there have been some people who have made some very well written articles about how regular expressions are a problem in a number of ways: 1) They look horrible. Like line noise. Each character is a functional unit, meaning something that would take a paragraph to describe is reduced to a small number of characters. Given that programmers tend to spend more time thinking than typing, I don't see any advantage to this. 2) They can fail in subtle ways. Exceptional cases can emerge where an expression which works in 99% of cases starts losing characters whose possibility were missed by the author 3) They can very quickly become rather long (check the expression for an email address in the back of the 'mastering regular expressions' o'reilly book). 4) The use of multi-line switches and other trailing-end characters complicates things further. One of the great things about python is that its string, slice, and split/join functions mean that I rarely use regular expressions in python. In fact, I try to avoid it. But a more pythonic matching and substitution system could be a great thing. The first thing that occurred to me in trying to imagine what an easier to use alternative would look like is that they're the wrong way round: the functional characters - the things that actually do things - are escaped, while the match strings written in text are the default. Unless you're trying to write a '/' or '\', that is, which you have to escape (carefully, if you're writing something exposed to the internet and you don't want your server hosed by a hacker). In other words, it is the match string which should be treated as special, and the special functions which should be the norm. So, for an example first foray into this idea (I'm making this up as I go along.. I should point out!) Instead of: /\d+hello/ How about (explanation of syntax to follow): boolean = match(input, "oneormore(digit).one('hello')") I'm using a '.' to separate lexical units here. The specifying functions indicate how many times or under what circumstances the unit is matched, and within the brackets are classes representing what needs to be matched. 'digit' represents '\d' in this case, and a string is just that. Taking it a bit further: /\d{1,3}hello/ is replaced by boolean = match(input, "range(digit, (1,3)).one('hello')" Ok, so what about substitution.. s/.*(hello).*/$1/ result = substitute(input, "many(char)|one('hello')|many(char)", "match(0)") Instead of dots, matches which should be captured are contained between pipe symbols. I'm still having an argument with myself as to whether some sort of function/keyword should be used instead. I dunno. That's why I emailed you guys :-) I'm going to have a bigger think about this tomorrow, but I think it could be a great feature. Cheers! (and thanks for a great language), Giles _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com