Not sure why you need the second re. Once you've done the search you know you have "/" followed by separated floats so you could just do a split() on ",".
On Sun, Aug 23, 2009 at 11:00 AM, Randolph Bentson <[email protected]>wrote: > On Sun, Aug 23, 2009 at 09:02:53AM -0700, James Thiele wrote: > > If I'm reading this correctly, you want to to verify that the full string > > matches "(AB+)+" and then print it followed by the submatches of "AB+" . > > Combining your code with Bryan's suggestion: > > #!/usr/bin/env python > > import re > > ptn = re.compile("^((AB+)+)$") > > str = "ABABBABBBABBBBABBBBBABBBBBB" > > if ptn.match(str): > > print str, re.findall('(AB+)', str) > > Thanks for your help. I had simplified my example, but this solves the > core problem. Here's an extract from the actual data and my application > of your suggestions: > #!/usr/bin/env python > import re > wpx = re.compile("WPX/(\d+)(,([-+]?\d+\.\d*e[-+]\d+))+") > floats = re.compile("[-+]?\d+\.\d*e[-+]\d+") > # > lines = [ > "WPX/1,8.2954231790e+006,1.0133209480e+005,1.7395780740e-004", > > "WPX/2,2.739e+06,3.301e+04,-8.822e+00,-4.688e+00,-1.443e-01,-6.109e-02", > > > "WPX/3,1.3e+5,6.2e+2,-1.7e-1,-1.8e+1,-4.3e-3,-2.1e-5,-7.4e-2,-2.6-5,7.2e-7,1.0e-6", > "Other stuff", ] > info = {"WPX":[], } > # > for line in lines: > mo = wpx.search(line) > if mo: > > info["WPX"].append([int(mo.group(1))]+map(float,floats.findall(line))) > continue > # > # ... much later ... > # > for value in info["WPX"]: > print value > > The technique works for this case, but it seems a bit fragile. I still > wonder if there isn't a more robust method which would work for a messier > collection of nested groups. Perhaps I'll have to revert to traditional > parsing when that case appears. > > -- > Randolph Bentson > [email protected] > -- No electrons were harmed in the creation of this email.
