On Apr 22, 9:56 am, [EMAIL PROTECTED] wrote: > Could someone tell me why: > > >>> import re > >>> p = re.compile('\\.*\\(.*)')
Short answer: *ALWAYS* use raw strings for regexes in Python source files. Long answer: '\\.*\\(.*)' is equivalent to r'\.*\(.*)' So what re.compile is seeing is: \. -- a literal dot or period or full stop (not a metacharacter) * -- meaning 0 or more occurrences of the dot \( -- a literal left parenthesis . -- dot metacharacter meaning any character bar a newline * -- meaning 0 or more occurences of almost anything ) -- a right parenthesis grouping metacharacter; a bit lonely hence the exception. What you probably want is: \\ -- literal backslash .* -- any stuff \\ -- literal backslash (.*) -- grouped (any stuff) > > Fails with message: > > Traceback (most recent call last): > File "<pyshell#12>", line 1, in <module> > re.compile('\\dir\\(file)') > File "C:\Python25\lib\re.py", line 180, in compile > return _compile(pattern, flags) > File "C:\Python25\lib\re.py", line 233, in _compile > raise error, v # invalid expression > error: unbalanced parenthesis > > I thought '\\' should just be interpreted as a single '\' and not > affect anything afterwards... The second and third paragraphs of the re docs (http://docs.python.org/ lib/module-re.html) cover this: """ Regular expressions use the backslash character ("\") to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python's usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be "\\", and each backslash must be expressed as "\\" inside a regular Python string literal. The solution is to use Python's raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with "r". So r"\n" is a two-character string containing "\" and "n", while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation. """ Recommended reading: http://www.amk.ca/python/howto/regex/regex.html#SECTION000420000000000000000 > > The script 'redemo.py' shipped with Python by default is just fine > about this regex however. That's because you are typing the regex into a Tkinter app. Likewise if you were reading the regex from (say) a config file or were typing it to a raw_input call. The common factor is that you are not passing it through an extra level of backslash processing. HTH, John -- http://mail.python.org/mailman/listinfo/python-list