On Apr 21, 6:56 pm, [EMAIL PROTECTED] wrote: > Could someone tell me why: > > >>> import re > >>> p = re.compile('\\.*\\(.*)') > > Fails with message: > > Traceback (most recent call last): > File "<pyshell#12>", line 1, in <module> > re.compile('\\dir\\(file)') > File "C:\Python25\lib\re.py", line 180, in compile > return _compile(pattern, flags) > File "C:\Python25\lib\re.py", line 233, in _compile > raise error, v # invalid expression > error: unbalanced parenthesis > > I thought '\\' should just be interpreted as a single '\' and not > affect anything afterwards... > > The script 'redemo.py' shipped with Python by default is just fine > about this regex however.
You are getting overlap between the Python string literal \\ escaping and re's \\ escaping. In a Python string literal '\\' gets collapsed down to '\', so to get your desired result, you would need to double- double every '\', as in: p = re.compile('\\\\.*\\\\(.*)') Ugly, no? Fortunately, Python has a special form for string literals, called "raw" which suppresses Python's processing of \'s for escaping - I think this was done expressly to help simplify entering re strings. To use raw format for a string literal, just precede the opening quotation mark with an r. Here is your original string, using a raw literal: p = re.compile(r'\\.*\\(.*)') This will compile ok. (Sometimes these literals are referred to as "raw strings" - I think this is confusing because new users think this is a special type of string type, different from str. This creates the EXACT SAME type of str; the r just tells the compiler/interpreter to handle the quoted literal a little differently. So I prefer to call them "raw literals".) -- Paul -- http://mail.python.org/mailman/listinfo/python-list