Re: Problem with re module

2011-03-22 Thread John Bokma
John Harrington beartiger@gmail.com writes:

 I'm trying to use the following substitution,

  lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
 \2',lineList[i])

 I intend this to match any string \begin{document} that doesn't end
 in a line ending.  If there's no line ending, then, I want to place
 two carriage returns between the string and the non-line end
 character.

 However, this places carriage returns even when the string is followed
 directly after with a line ending.  Can someone explain to me why this
 match is not behaving as I intend it to, especially the ([^$])?

[^$] matches: not a $ character

You might want [^\n]

-- 
John Bokma   j3b

Blog: http://johnbokma.com/Facebook: http://www.facebook.com/j.j.j.bokma
Freelance Perl  Python Development: http://castleamber.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with re module

2011-03-22 Thread Peter Otten
John Harrington wrote:

 I'm trying to use the following substitution,
 
  lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
 \2',lineList[i])
 
 I intend this to match any string \begin{document} that doesn't end
 in a line ending.  If there's no line ending, then, I want to place
 two carriage returns between the string and the non-line end
 character.
 
 However, this places carriage returns even when the string is followed
 directly after with a line ending.  Can someone explain to me why this
 match is not behaving as I intend it to, especially the ([^$])?

Quoting http://docs.python.org/library/re.html:

Special characters are not active inside sets. For example, [akm$] will 
match any of the characters 'a', 'k', 'm', or '$';


 Also, how can I write a regex that matches what I wish to match, as
 described above?

I think you want a negative lookahead assertion, (?!...):

 print re.compile((xxx)(?!$), re.MULTILINE).sub(r\1**, aaa bbb 
xxx\naaa xxx bbb\nxxx)
aaa bbb xxx
aaa xxx** bbb
xxx


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with re module

2011-03-22 Thread John Harrington
On Mar 22, 11:16 am, John Bokma j...@castleamber.com wrote:
 John Harrington beartiger@gmail.com writes:
  I'm trying to use the following substitution,

       lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
  \2',lineList[i])

  I intend this to match any string \begin{document} that doesn't end
  in a line ending.  If there's no line ending, then, I want to place
  two carriage returns between the string and the non-line end
  character.

  However, this places carriage returns even when the string is followed
  directly after with a line ending.  Can someone explain to me why this
  match is not behaving as I intend it to, especially the ([^$])?

 [^$] matches: not a $ character

 You might want [^\n]

Thank you, John.

I thought that when you use r before the regex, $ matches an end of
line.  But, in any case, if I use [^\n] as you suggest I get the
same result.

Here's a script that illustrates the problem.  Any help would be
appreciated!:

#BEGIN SCRIPT
import re

outlist = []
myfile  = raw.tex

fin = open(myfile, r)
lineList = fin.readlines()
fin.close()

for i in range(0,len(lineList)):

 lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n
\2',lineList[i])

 outlist.append(lineList[i])

fou = open(myfile, w)
for i in range(len(outlist)):
   fou.write(outlist[i])
fou.close
#END SCRIPT

And the file raw.tex:

%BEGIN TeX FILE
\begin{document}
This line should remain right after the above line in the output, but
doesn't

\begin{document}Extra stuff here should appear below the begin line
and does in the output.
%END TeX FILE
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with re module

2011-03-22 Thread Benjamin Kaplan
On Tue, Mar 22, 2011 at 2:40 PM, John Harrington
beartiger@gmail.com wrote:
 On Mar 22, 11:16 am, John Bokma j...@castleamber.com wrote:
 John Harrington beartiger@gmail.com writes:
  I'm trying to use the following substitution,

       lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
  \2',lineList[i])

  I intend this to match any string \begin{document} that doesn't end
  in a line ending.  If there's no line ending, then, I want to place
  two carriage returns between the string and the non-line end
  character.

  However, this places carriage returns even when the string is followed
  directly after with a line ending.  Can someone explain to me why this
  match is not behaving as I intend it to, especially the ([^$])?

 [^$] matches: not a $ character

 You might want [^\n]

 Thank you, John.

 I thought that when you use r before the regex, $ matches an end of
 line.  But, in any case, if I use [^\n] as you suggest I get the
 same result.



r before a string has nothing to do with regexes. It signals a raw
string- escape sequences wont' be escaped.
 print 'a\tb'
a   b
 print r'a\tb'
a\tb

We use raw strings for regexes because otherwise, you'd have to
remember double up all your backslashes. And double up your doubled up
backslashes when you really want a backslash.

 Here's a script that illustrates the problem.  Any help would be
 appreciated!:

 #BEGIN SCRIPT
 import re

 outlist = []
 myfile  = raw.tex

 fin = open(myfile, r)
 lineList = fin.readlines()
 fin.close()

 for i in range(0,len(lineList)):

     lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n
 \2',lineList[i])

     outlist.append(lineList[i])

 fou = open(myfile, w)
 for i in range(len(outlist)):
   fou.write(outlist[i])
 fou.close
 #END SCRIPT

 And the file raw.tex:

 %BEGIN TeX FILE
 \begin{document}
 This line should remain right after the above line in the output, but
 doesn't

 \begin{document}Extra stuff here should appear below the begin line
 and does in the output.
 %END TeX FILE

Works for me. Do you have a space after the \begin{document} or
something? Because that get moved. You might want to check for
non-whitespace characters in the reges instead of just non-newlines.

 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with re module

2011-03-22 Thread John Harrington
On Mar 22, 12:07 pm, Benjamin Kaplan benjamin.kap...@case.edu wrote:
 On Tue, Mar 22, 2011 at 2:40 PM, John Harrington



 beartiger@gmail.com wrote:
  On Mar 22, 11:16 am, John Bokma j...@castleamber.com wrote:
  John Harrington beartiger@gmail.com writes:
   I'm trying to use the following substitution,

        lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
   \2',lineList[i])

   I intend this to match any string \begin{document} that doesn't end
   in a line ending.  If there's no line ending, then, I want to place
   two carriage returns between the string and the non-line end
   character.

   However, this places carriage returns even when the string is followed
   directly after with a line ending.  Can someone explain to me why this
   match is not behaving as I intend it to, especially the ([^$])?

  [^$] matches: not a $ character

  You might want [^\n]

  Thank you, John.

  I thought that when you use r before the regex, $ matches an end of
  line.  But, in any case, if I use [^\n] as you suggest I get the
  same result.

 r before a string has nothing to do with regexes. It signals a raw
 string- escape sequences wont' be escaped. print 'a\tb'
 a       b
  print r'a\tb'

 a\tb

 We use raw strings for regexes because otherwise, you'd have to
 remember double up all your backslashes. And double up your doubled up
 backslashes when you really want a backslash.



  Here's a script that illustrates the problem.  Any help would be
  appreciated!:

  #BEGIN SCRIPT
  import re

  outlist = []
  myfile  = raw.tex

  fin = open(myfile, r)
  lineList = fin.readlines()
  fin.close()

  for i in range(0,len(lineList)):

      lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n
  \2',lineList[i])

      outlist.append(lineList[i])

  fou = open(myfile, w)
  for i in range(len(outlist)):
    fou.write(outlist[i])
  fou.close
  #END SCRIPT

  And the file raw.tex:

  %BEGIN TeX FILE
  \begin{document}
  This line should remain right after the above line in the output, but
  doesn't

  \begin{document}Extra stuff here should appear below the begin line
  and does in the output.
  %END TeX FILE

 Works for me. Do you have a space after the \begin{document} or
 something? Because that get moved. You might want to check for
 non-whitespace characters in the reges instead of just non-newlines.

  --
 http://mail.python.org/mailman/listinfo/python-list



Matching the non-whitespace works, but I'm troubled I can't match a
non-end-of-line.  No, there was no space after the string.

Thank you for your help, Ben

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with re module

2011-03-22 Thread Ethan Furman

John Harrington wrote:

Here's a script that illustrates the problem.  Any help would be
appreciated!:

#BEGIN SCRIPT
import re

outlist = []
myfile  = raw.tex

fin = open(myfile, r)
lineList = fin.readlines()
fin.close()

for i in range(0,len(lineList)):

 lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n
\2',lineList[i])

 outlist.append(lineList[i])

fou = open(myfile, w)
for i in range(len(outlist)):
   fou.write(outlist[i])
fou.close
#END SCRIPT

And the file raw.tex:

%BEGIN TeX FILE
\begin{document}
This line should remain right after the above line in the output, but
doesn't

\begin{document}Extra stuff here should appear below the begin line
and does in the output.
%END TeX FILE


Here's the important tidbit:

re.sub(r'(\\begin{document})(.+)', r'\1\n\n\2', line)

From the docs:
'.'
(Dot.) In the default mode, this matches any character except a newline. 
If the DOTALL flag has been specified, this matches any character 
including a newline.


'+'
Causes the resulting RE to match 1 or more repetitions of the preceding 
RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will 
not match just ‘a’.



And here's the entire program, a bit more pythonically:

8---
import re

outlist = []
myfile  = raw.tex

fin = open(myfile, r)
lineList = fin.readlines()
fin.close()

for line in lineList:
 line = re.sub(r'(\\begin{document})(.+)', r'\1\n\n\2', line)
 outlist.append(line)

fou = open(myfile, w)
for line in outlist:
   fou.write(line)
fou.close
8---

Hope this helps!

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list