"Michael Sparks" <[EMAIL PROTECTED]> wrote > The most pathological example of regex avoidance I've seen in a > while > is this: > > def isPlain(text): > plaindict = {'-': True, '.': True, '1': True, '0': True, '3': > True, > '2': True, '5': True, '4': True, '7': True, '6': True, '9': > True, > '8': True, 'A': True, 'C': True, 'B': True, 'E': True, 'D': > True, > 'G': True, 'F': True, 'I': True, 'H': True, 'K': True, 'J': > True, > 'M': True, 'L': True, 'O': True, 'N': True, 'Q': True, 'P': > True, > 'S': True, 'R': True, 'U': True, 'T': True, 'W': True, 'V': > True, > 'Y': True, 'X': True, 'Z': True, '_': True, 'a': True, 'c': > True, > 'b': True, 'e': True, 'd': True, 'g': True, 'f': True, 'i': > True, > 'h': True, 'k': True, 'j': True, 'm': True, 'l': True, 'o': > True, > 'n': True, 'q': True, 'p': True, 's': True, 'r': True, 'u': > True, > 't': True, 'w': True, 'v': True, 'y': True, 'x': True, 'z': > True} > > for c in text: > if plaindict.get(c, False) == False: > return False > return True > > (sadly this is from real code - in defence of the person > who wrote it, they weren't even *aware* of regexes) > > That's equivalent to the regular expression: > * ^[0-9A-Za-z_.-]*$
While using a dictionary is probably overkill, so is a regex. A simple string holding all characters and an 'in' test would probably be both easier to read and faster. Which kind of illustrates the point of the thread I think! :-) > Now, which is clearer? If you learn to read & write regular > expressions, then > the short regular expression is the clearest form. It's also > quicker. Whether its quicker will depend on several factors including the implementation of the regex library as well as the length of the string. If its a single char I'd expect the dictionary lookup to be faster than a regex parse or the string inclusion test... In fact this is how the C standard library usually implements functions like toupper() and tolower() etc, and for speed reasons. > to say "don't use them if there's an alternative" is a little > strong. > Aside from the argument that "you now have two problems" > (which always applies if you think all problems can be hit with > the same hammer), solving *everything* with regex is often slower. regex can be faster than a sequential string search. It depends on the problem. The thing that we are all trying to say here (I think) is that regex are powerful tools but dangerously complex. Its nearly always safer and easier to use alternatives where they exist, but when used intelligently they can solve difficult problems very elegantly. > JWZ's quote is more aimed at people who think about solving > every problem with regexes (and where you end up with 10 line > monstrosities in perl with 5 levels of backtracking). Agreed and thats what the message of the thread is about. Use them ewhen they are the right solution, but look for altrernatives first. > Also, it's worth bearing in mind that there's more than one > definition of what > regex's are To be picky, there is only one definition of what regexd are, but there are many grammars or dialects. > If your reaction to seeing a problem is "this looks like it can be > solved > using a regex", you should think to yourself: has someone else > already hit > this problem and have they come up with a specialised pattern > matcher for it > already? If not, why not? Absolutely agree with this. > :-) Likewise :-) Alan g. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor