Re: Problem with re module
John Harrington beartiger@gmail.com writes: I'm trying to use the following substitution, lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n \2',lineList[i]) I intend this to match any string \begin{document} that doesn't end in a line ending. If there's no line ending, then, I want to place two carriage returns between the string and the non-line end character. However, this places carriage returns even when the string is followed directly after with a line ending. Can someone explain to me why this match is not behaving as I intend it to, especially the ([^$])? [^$] matches: not a $ character You might want [^\n] -- John Bokma j3b Blog: http://johnbokma.com/Facebook: http://www.facebook.com/j.j.j.bokma Freelance Perl Python Development: http://castleamber.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with re module
John Harrington wrote: I'm trying to use the following substitution, lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n \2',lineList[i]) I intend this to match any string \begin{document} that doesn't end in a line ending. If there's no line ending, then, I want to place two carriage returns between the string and the non-line end character. However, this places carriage returns even when the string is followed directly after with a line ending. Can someone explain to me why this match is not behaving as I intend it to, especially the ([^$])? Quoting http://docs.python.org/library/re.html: Special characters are not active inside sets. For example, [akm$] will match any of the characters 'a', 'k', 'm', or '$'; Also, how can I write a regex that matches what I wish to match, as described above? I think you want a negative lookahead assertion, (?!...): print re.compile((xxx)(?!$), re.MULTILINE).sub(r\1**, aaa bbb xxx\naaa xxx bbb\nxxx) aaa bbb xxx aaa xxx** bbb xxx -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with re module
On Mar 22, 11:16 am, John Bokma j...@castleamber.com wrote: John Harrington beartiger@gmail.com writes: I'm trying to use the following substitution, lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n \2',lineList[i]) I intend this to match any string \begin{document} that doesn't end in a line ending. If there's no line ending, then, I want to place two carriage returns between the string and the non-line end character. However, this places carriage returns even when the string is followed directly after with a line ending. Can someone explain to me why this match is not behaving as I intend it to, especially the ([^$])? [^$] matches: not a $ character You might want [^\n] Thank you, John. I thought that when you use r before the regex, $ matches an end of line. But, in any case, if I use [^\n] as you suggest I get the same result. Here's a script that illustrates the problem. Any help would be appreciated!: #BEGIN SCRIPT import re outlist = [] myfile = raw.tex fin = open(myfile, r) lineList = fin.readlines() fin.close() for i in range(0,len(lineList)): lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n \2',lineList[i]) outlist.append(lineList[i]) fou = open(myfile, w) for i in range(len(outlist)): fou.write(outlist[i]) fou.close #END SCRIPT And the file raw.tex: %BEGIN TeX FILE \begin{document} This line should remain right after the above line in the output, but doesn't \begin{document}Extra stuff here should appear below the begin line and does in the output. %END TeX FILE -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with re module
On Tue, Mar 22, 2011 at 2:40 PM, John Harrington beartiger@gmail.com wrote: On Mar 22, 11:16 am, John Bokma j...@castleamber.com wrote: John Harrington beartiger@gmail.com writes: I'm trying to use the following substitution, lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n \2',lineList[i]) I intend this to match any string \begin{document} that doesn't end in a line ending. If there's no line ending, then, I want to place two carriage returns between the string and the non-line end character. However, this places carriage returns even when the string is followed directly after with a line ending. Can someone explain to me why this match is not behaving as I intend it to, especially the ([^$])? [^$] matches: not a $ character You might want [^\n] Thank you, John. I thought that when you use r before the regex, $ matches an end of line. But, in any case, if I use [^\n] as you suggest I get the same result. r before a string has nothing to do with regexes. It signals a raw string- escape sequences wont' be escaped. print 'a\tb' a b print r'a\tb' a\tb We use raw strings for regexes because otherwise, you'd have to remember double up all your backslashes. And double up your doubled up backslashes when you really want a backslash. Here's a script that illustrates the problem. Any help would be appreciated!: #BEGIN SCRIPT import re outlist = [] myfile = raw.tex fin = open(myfile, r) lineList = fin.readlines() fin.close() for i in range(0,len(lineList)): lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n \2',lineList[i]) outlist.append(lineList[i]) fou = open(myfile, w) for i in range(len(outlist)): fou.write(outlist[i]) fou.close #END SCRIPT And the file raw.tex: %BEGIN TeX FILE \begin{document} This line should remain right after the above line in the output, but doesn't \begin{document}Extra stuff here should appear below the begin line and does in the output. %END TeX FILE Works for me. Do you have a space after the \begin{document} or something? Because that get moved. You might want to check for non-whitespace characters in the reges instead of just non-newlines. -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with re module
On Mar 22, 12:07 pm, Benjamin Kaplan benjamin.kap...@case.edu wrote: On Tue, Mar 22, 2011 at 2:40 PM, John Harrington beartiger@gmail.com wrote: On Mar 22, 11:16 am, John Bokma j...@castleamber.com wrote: John Harrington beartiger@gmail.com writes: I'm trying to use the following substitution, lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n \2',lineList[i]) I intend this to match any string \begin{document} that doesn't end in a line ending. If there's no line ending, then, I want to place two carriage returns between the string and the non-line end character. However, this places carriage returns even when the string is followed directly after with a line ending. Can someone explain to me why this match is not behaving as I intend it to, especially the ([^$])? [^$] matches: not a $ character You might want [^\n] Thank you, John. I thought that when you use r before the regex, $ matches an end of line. But, in any case, if I use [^\n] as you suggest I get the same result. r before a string has nothing to do with regexes. It signals a raw string- escape sequences wont' be escaped. print 'a\tb' a b print r'a\tb' a\tb We use raw strings for regexes because otherwise, you'd have to remember double up all your backslashes. And double up your doubled up backslashes when you really want a backslash. Here's a script that illustrates the problem. Any help would be appreciated!: #BEGIN SCRIPT import re outlist = [] myfile = raw.tex fin = open(myfile, r) lineList = fin.readlines() fin.close() for i in range(0,len(lineList)): lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n \2',lineList[i]) outlist.append(lineList[i]) fou = open(myfile, w) for i in range(len(outlist)): fou.write(outlist[i]) fou.close #END SCRIPT And the file raw.tex: %BEGIN TeX FILE \begin{document} This line should remain right after the above line in the output, but doesn't \begin{document}Extra stuff here should appear below the begin line and does in the output. %END TeX FILE Works for me. Do you have a space after the \begin{document} or something? Because that get moved. You might want to check for non-whitespace characters in the reges instead of just non-newlines. -- http://mail.python.org/mailman/listinfo/python-list Matching the non-whitespace works, but I'm troubled I can't match a non-end-of-line. No, there was no space after the string. Thank you for your help, Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with re module
John Harrington wrote: Here's a script that illustrates the problem. Any help would be appreciated!: #BEGIN SCRIPT import re outlist = [] myfile = raw.tex fin = open(myfile, r) lineList = fin.readlines() fin.close() for i in range(0,len(lineList)): lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n \2',lineList[i]) outlist.append(lineList[i]) fou = open(myfile, w) for i in range(len(outlist)): fou.write(outlist[i]) fou.close #END SCRIPT And the file raw.tex: %BEGIN TeX FILE \begin{document} This line should remain right after the above line in the output, but doesn't \begin{document}Extra stuff here should appear below the begin line and does in the output. %END TeX FILE Here's the important tidbit: re.sub(r'(\\begin{document})(.+)', r'\1\n\n\2', line) From the docs: '.' (Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline. '+' Causes the resulting RE to match 1 or more repetitions of the preceding RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will not match just ‘a’. And here's the entire program, a bit more pythonically: 8--- import re outlist = [] myfile = raw.tex fin = open(myfile, r) lineList = fin.readlines() fin.close() for line in lineList: line = re.sub(r'(\\begin{document})(.+)', r'\1\n\n\2', line) outlist.append(line) fou = open(myfile, w) for line in outlist: fou.write(line) fou.close 8--- Hope this helps! ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list