On 04/01/2011 19:37, Jeremy wrote:
On Tuesday, January 4, 2011 11:26:48 AM UTC-7, MRAB wrote:
On 04/01/2011 17:11, Jeremy wrote:
I am trying to write a regular expression that finds and deletes (replaces with
nothing) comments in a string/file. Comments are defined by the first
non-whitespace character is a 'c' or a dollar sign somewhere in the line. I
want to replace these comments with nothing which isn't too hard. The trouble
is, the comments are replaced with a new-line; or the new-line isn't captured
in the regular expression.
Below, I have copied a minimal example. Can someone help?
Thanks,
Jeremy
import re
text = """ c
C - Second full line comment (first comment had no text)
c Third full line comment
F44:N 2 $ Inline comments start with dollar sign and go to end of line"""
commentPattern = re.compile("""
(^\s*?c\s*?.*?| # Comment start with c or C
\$.*?)$\n # Comment starting with $
""", re.VERBOSE|re.MULTILINE|re.IGNORECASE)
Part of the problem is that you're not using raw string literals or
doubling the backslashes.
Try soemthing like this:
commentPattern = re.compile(r"""
(^[ \t]*c.*\n| # Comment start with c or C
[ \t]*\$.*) # Comment starting with $
""", re.VERBOSE|re.MULTILINE|re.IGNORECASE)
Using a raw string literal fixed the problem for me. Thanks for the
suggestion. Why is that so important?
Regexes often use escape sequences, but so do string literals, and a
sequence which is intended for the regex engine might not get passed
along correctly. For example, in a normal string literal \b means
'backspace' and will be passed to the regex engine as that; in a regex
it usually means 'word boundary':
A regex for "the" as a word: \bthe\b
As a raw string literal: r"\bthe\b"
As a normal string literal: "\\bthe\\b"
"\bthe\b" means: backspace + "the" + backspace
--
http://mail.python.org/mailman/listinfo/python-list