Re: matching patterns after regex?
On Aug 12, 12:53 pm, Bernard bernard.ch...@gmail.com wrote: On 12 août, 06:15, Martin mdeka...@gmail.com wrote: Hi, I have a string (see below) and ideally I would like to pull out the decimal number which follows the bounding coordinate information. For example ideal from this string I would return... s = '\nGROUP = ARCHIVEDMETADATA\n GROUPTYPE= MASTERGROUP\n\n GROUP = BOUNDINGRECTANGLE\n\nOBJECT = NORTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n VALUE= 19.82039\nEND_OBJECT = NORTHBOUNDINGCOORDINATE\n\nOBJECT = SOUTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n VALUE= 9.910197\nEND_OBJECT = SOUTHBOUNDINGCOORDINATE\n\nOBJECT = EASTBOUNDINGCOORDINATE\n NUM_VAL = 1\n VALUE= 10.6506458717851\nEND_OBJECT = EASTBOUNDINGCOORDINATE\n\nOBJECT = WESTBOUNDINGCOORDINATE\n NUM_VAL = 1\n VALUE= 4.3188348375893e-15\nEND_OBJECT = WESTBOUNDINGCOORDINATE\n\n END_GROUP NORTHBOUNDINGCOORDINATE = 19.82039 SOUTHBOUNDINGCOORDINATE = 9.910197 EASTBOUNDINGCOORDINATE = 10.6506458717851 WESTBOUNDINGCOORDINATE = 4.3188348375893e-15 so far I have only managed to extract the numbers by doing re.findall ([\d.]*\d, s), which returns ['1', '19.82039', '1', '9.910197', '1', '10.6506458717851', '1', '4.3188348375893', '15', etc. Now the first problem that I can see is that my string match chops off the e-15 part and I am not sure how to incorporate the potential for that in my pattern match. Does anyone have any suggestions as to how I could also match this? Ideally I would have a statement which printed the number between the two bounding coordinate strings for example NORTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n VALUE= 19.82039\nEND_OBJECT = NORTHBOUNDINGCOORDINATE\n\n Something that matched NORTHBOUNDINGCOORDINATE and printed the decimal number before it hit the next string NORTHBOUNDINGCOORDINATE. But I am not sure how to do this. any suggestions would be appreciated. Many thanks Martin Hey Martin, here's a regex I've just tested : (\w+COORDINATE).*\s+VALUE\s+=\s([\d\. \w-]+) the first match corresponds to the whateverBOUNDINGCOORDINATE and the second match is the value. please provide some more entries if you'd like me to test my regex some more :) cheers Bernard Thanks Bernard it doesn't seem to be working for me... I tried re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) is that what you meant? Apologies if not, that results in a syntax error: In [557]: re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) File ipython console, line 1 re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) ^ SyntaxError: unexpected character after line continuation character Thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: matching patterns after regex?
On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote: I tried re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) You need to put quotes around strings. In this case, because you're using regular expressions, you should use a raw string: re.findall(r(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) will probably work. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: matching patterns after regex?
On Aug 12, 1:23 pm, Steven D'Aprano st...@remove-this- cybersource.com.au wrote: On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote: I tried re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) You need to put quotes around strings. In this case, because you're using regular expressions, you should use a raw string: re.findall(r(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) will probably work. -- Steven Thanks I see. so I tried it and if I use it as it is, it matches the first instance: I n [594]: re.findall(r(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')] So I adjusted the first part of the regex, on the basis I could sub NORTH for SOUTH etc. In [595]: re.findall(r(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\. \w-]+),s) Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')] But in both cases it doesn't return the decimal value rather the value that comes after NUM_VAL = , rather than VALUE = ? -- http://mail.python.org/mailman/listinfo/python-list
Re: matching patterns after regex?
On Aug 12, 1:42 pm, Martin mdeka...@gmail.com wrote: On Aug 12, 1:23 pm, Steven D'Aprano st...@remove-this- cybersource.com.au wrote: On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote: I tried re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) You need to put quotes around strings. In this case, because you're using regular expressions, you should use a raw string: re.findall(r(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) will probably work. -- Steven Thanks I see. so I tried it and if I use it as it is, it matches the first instance: I n [594]: re.findall(r(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')] So I adjusted the first part of the regex, on the basis I could sub NORTH for SOUTH etc. In [595]: re.findall(r(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\. \w-]+),s) Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')] But in both cases it doesn't return the decimal value rather the value that comes after NUM_VAL = , rather than VALUE = ? I think I kind of got that to work...but I am clearly not quite understanding how it works as I tried to use it again to match something else. In this case I want to print the values 0.00 and 2223901.039333 from a string like this... YDim=1200\n\t\tUpperLeftPointMtrs=(0.00,2223901.039333)\n\t\t I tried which I though was matching the statement and printing the decimal number after the equals sign?? re.findall(r(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+), s) where s is the string Many thanks for the help -- http://mail.python.org/mailman/listinfo/python-list
Re: matching patterns after regex?
On 12 août, 12:43, Martin mdeka...@gmail.com wrote: On Aug 12, 1:42 pm, Martin mdeka...@gmail.com wrote: On Aug 12, 1:23 pm, Steven D'Aprano st...@remove-this- cybersource.com.au wrote: On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote: I tried re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) You need to put quotes around strings. In this case, because you're using regular expressions, you should use a raw string: re.findall(r(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) will probably work. -- Steven Thanks I see. so I tried it and if I use it as it is, it matches the first instance: I n [594]: re.findall(r(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')] So I adjusted the first part of the regex, on the basis I could sub NORTH for SOUTH etc. In [595]: re.findall(r(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\. \w-]+),s) Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')] But in both cases it doesn't return the decimal value rather the value that comes after NUM_VAL = , rather than VALUE = ? I think I kind of got that to work...but I am clearly not quite understanding how it works as I tried to use it again to match something else. In this case I want to print the values 0.00 and 2223901.039333 from a string like this... YDim=1200\n\t\tUpperLeftPointMtrs=(0.00,2223901.039333)\n\t\t I tried which I though was matching the statement and printing the decimal number after the equals sign?? re.findall(r(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+), s) where s is the string Many thanks for the help You have to do it with 2 matches in the same regex: regex = rUpperLeftPointMtrs=\(([\d\.]+),([\d\.]+) The first match is before the , and the second one is after the , :) You should probably learn how to play with regexes. I personnaly use a visual tool called RX Toolkit[1] that comes with Komodo IDE. [1] http://docs.activestate.com/komodo/4.4/regex.html -- http://mail.python.org/mailman/listinfo/python-list
Re: matching patterns after regex?
Bernard wrote: On 12 août, 12:43, Martin mdeka...@gmail.com wrote: On Aug 12, 1:42 pm, Martin mdeka...@gmail.com wrote: On Aug 12, 1:23 pm, Steven D'Aprano st...@remove-this- cybersource.com.au wrote: On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote: I tried re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) You need to put quotes around strings. In this case, because you're using regular expressions, you should use a raw string: re.findall(r(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) will probably work. -- Steven Thanks I see. so I tried it and if I use it as it is, it matches the first instance: I n [594]: re.findall(r(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')] So I adjusted the first part of the regex, on the basis I could sub NORTH for SOUTH etc. In [595]: re.findall(r(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\. \w-]+),s) Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')] But in both cases it doesn't return the decimal value rather the value that comes after NUM_VAL = , rather than VALUE = ? I think I kind of got that to work...but I am clearly not quite understanding how it works as I tried to use it again to match something else. In this case I want to print the values 0.00 and 2223901.039333 from a string like this... YDim=1200\n\t\tUpperLeftPointMtrs=(0.00,2223901.039333)\n\t\t I tried which I though was matching the statement and printing the decimal number after the equals sign?? re.findall(r(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+), s) where s is the string Many thanks for the help You have to do it with 2 matches in the same regex: regex = rUpperLeftPointMtrs=\(([\d\.]+),([\d\.]+) The first match is before the , and the second one is after the , :) You should probably learn how to play with regexes. I personnaly use a visual tool called RX Toolkit[1] that comes with Komodo IDE. [1] http://docs.activestate.com/komodo/4.4/regex.html Haven't tried it myself but how about this? http://re-try.appspot.com/ -- Kindest regards. Mark Lawrence. -- http://mail.python.org/mailman/listinfo/python-list
Re: matching patterns after regex?
On Aug 12, 10:29 pm, Mark Lawrence breamore...@yahoo.co.uk wrote: Bernard wrote: On 12 août, 12:43, Martin mdeka...@gmail.com wrote: On Aug 12, 1:42 pm, Martin mdeka...@gmail.com wrote: On Aug 12, 1:23 pm, Steven D'Aprano st...@remove-this- cybersource.com.au wrote: On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote: I tried re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) You need to put quotes around strings. In this case, because you're using regular expressions, you should use a raw string: re.findall(r(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) will probably work. -- Steven Thanks I see. so I tried it and if I use it as it is, it matches the first instance: I n [594]: re.findall(r(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s) Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')] So I adjusted the first part of the regex, on the basis I could sub NORTH for SOUTH etc. In [595]: re.findall(r(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\. \w-]+),s) Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')] But in both cases it doesn't return the decimal value rather the value that comes after NUM_VAL = , rather than VALUE = ? I think I kind of got that to work...but I am clearly not quite understanding how it works as I tried to use it again to match something else. In this case I want to print the values 0.00 and 2223901.039333 from a string like this... YDim=1200\n\t\tUpperLeftPointMtrs=(0.00,2223901.039333)\n\t\t I tried which I though was matching the statement and printing the decimal number after the equals sign?? re.findall(r(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+), s) where s is the string Many thanks for the help You have to do it with 2 matches in the same regex: regex = rUpperLeftPointMtrs=\(([\d\.]+),([\d\.]+) The first match is before the , and the second one is after the , :) You should probably learn how to play with regexes. I personnaly use a visual tool called RX Toolkit[1] that comes with Komodo IDE. [1]http://docs.activestate.com/komodo/4.4/regex.html Haven't tried it myself but how about this?http://re-try.appspot.com/ -- Kindest regards. Mark Lawrence. Thanks Mark and Bernard. I have managed to get it working and I appreciate the help with understanding the syntax. The web links are also very useful, I'll give them a go. Martin -- http://mail.python.org/mailman/listinfo/python-list