[EMAIL PROTECTED] wrote: > I have a regular expression that is approximately 100k bytes. (It is > basically a list of all known norwegian postal numbers and the > corresponding place with | in between. I know this is not the intended > use for regular expressions, but it should nonetheless work. > > the pattern is > ur'(N-|NO-)?(5259 HJELLESTAD|4026 STAVANGER|4027 STAVANGER........|8305 > SVOLVÆR)' > > The error message I get is: > RuntimeError: internal error in regular expression engine > And I'm not the least bit surprised. Your code is brittle (i.e. likely to break) and cannot, for example, cope with multiple spaces between the number and the word(s). Quite apart from breaking the interpreter :-)
I'd say your test was the clearest possible demonstration that there *is* a limit. Wouldn't it be better to have a dict keyed on the number and containing the word (which you can construct from the same source you constructed your horrendously long regexp)? Then if you find something matching the pattern (untested) ur'(N-|NO-)?((\d\d\d\d)\s*([A-Za-z ]+))' or something like it that actually works (I invariably get regexps wrong at least three times before I get them right) you can use the dict to validate the number and name. Quite apart from anything else, if the text line you are examining doesn't have the right syntactic form then you are going to test hundreds of options, none of which can possibly match. So matching the syntax and then validating the data identified seems like a much more sensible option (to me, at least). regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/ -- http://mail.python.org/mailman/listinfo/python-list