On Aug 30, 8:52 am, naugiedoggie <michael.a.p...@gmail.com> wrote: > On Aug 29, 1:14 pm, MRAB <pyt...@mrabarnett.plus.com> wrote: > > > > > > > On 29/08/2010 15:22, naugiedoggie wrote: > > > I'm having a problem with using a function as the replacement in > > > re.sub(). > > > Here is the function: > > > def normalize(s) : > > > return > > > urllib.quote(string.capwords(urllib.unquote(s.group('provider')))) > > > This normalises the provider and returns only that, and none of the > > remainder of the string. > > > I think you might want this: > > > def normalize(s): > > return s[ : s.start('provider')] + > > urllib.quote(string.capwords(urllib.unquote(s.group('provider')))) + > > s[s.start('provider') : ] > > > It returns the part before the provider, followed by the normalised > > provider, and then the part after the provider. > > Hello, > > Thanks for the reply. > > There must be something basic about the re.sub() function that I'm > missing. The documentation shows this example: > > <code>>>> def dashrepl(matchobj): > > ... if matchobj.group(0) == '-': return ' ' > ... else: return '-'>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files') > 'pro--gram files' > >>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE) > > 'Baked Beans & Spam' > </code> > > According to the doc, the modifying function takes one parameter, the > MatchObject. The re.sub function takes only a compiled regex object > or a pattern, generates a MatchObject from that object/pattern and > passes the MatchObject to the given function. Notice that in the > examples, the re.sub() returns the entire line, with the changes made. > But the function itself returns only the change. What is happening > for me is that, if I have a line that contains > &Search_Provider=chen&p=value, the processed line ends up with > &Chen&p=value. > > Now, I did follow up with your suggestion. `s' is actually a > MatchObject (bad param naming on my part, I started out passing a > string into the function and then changed it to a MatchObject, but > didn't change the param name), so I made the following change: > > <code> > return line[s.pos : s.start('provider')] + \ > > urllib.quote(string.capwords(urllib.unquote(s.group('provider')))) + \ > line[s.end('provider') : ] > </code> > > In order to make this work (finally), I had to make the processing > function look like this: > > <code> > def processLine(l) : > global line > line = l > provider = getProvider(line) > if provider == "No Provider" : return line > scenario = getScenario(line) > if filter (lambda a: a != None, [getOrg(s,scenario) for s in > orgs]) == [] : > line = re.sub(provider_pattern,normalize,line) > else : > line.replace(provider_parameter, org_parameter) > return line > </code> > > And then the call: > > <code> > lines = fileReader.readlines() > [ fileWriter.write(l) for l in [processLine(l) for l in lines]] > </code> > > Without this complicated gobbledigook, I could not get the correct > result. I hate global vars and I completely do not understand why I > have to go through this twisting and turning to get the desired > result. > > [ ... ] > > > These can be replaced by: > > > if 'Search_Type' in line and 'Search_Provider' in line: > > > > re.sub(provider_matcher,normalize,line) > > > re.sub is returning the result, which you're throwing away! > > > line = re.sub(provider_matcher,normalize,line) > > I can't count the number of times I have forgotten the meaning of > 'returns a string' when reading docs about doing substitutions. In > this case, I had put the `line = ' in and taken it out. And I should > know better, from years of programming in Java, where strings are > immutable and you _always_ get a new, returned string. Should be > second nature. > > Thanks for the help, much appreciated. > > mp
Hello, Well, that turned out to be still wrong. I did start getting the proper param=value back from my `normalize' function, but I got "extra" data as well. This works: <code> def normalize(s) : return s.group('search') +'='+urllib.quote(string.capwords(urllib.unquote(s.group('provider')))) </code> Essentially, the pattern contained two groups, one identifying the parameter name and one the value. By concat'ing the two back together, I was able to achieve the desired result. I suppose the lesson is, the function replaces the entire match rather than just the specified text captured. Thanks. mp -- http://mail.python.org/mailman/listinfo/python-list