Hi Michael, 

Is a non regex way any help? I can think of a way that uses string methods - 


space=" "

stringStuff="Stuff with multiple spaces"
indexN = 0 
ranges=[]
while 1:
   try:
      indexN=stringStuff.index(space, indexN)
      if indexN+1 == space:
         indexT = indexN
         while 1:
            indexT += 1
            if not indexT == " ":
               ranges.append((indexN, indexT))
               break
         indexN=indexT +1
        else:
          indexN += 1
    except ValueError:
        ranges.reverse()
         for (low, high) in ranges:
              stringStuff.replace[stringStuff[low:high], space]

HTH
Liam Clarke
             


On Tue, 4 Jan 2005 15:39:18 -0800, Michael Powe <[EMAIL PROTECTED]> wrote:
> Hello,
> 
> I'm having erratic results with a regex.  I'm hoping someone can
> pinpoint the problem.
> 
> This function removes HTML formatting codes from a text email that is
> poorly exported -- it is supposed to be a text version of an HTML
> mailing, but it's basically just a text version of the HTML page.  I'm
> not after anything elaborate, but it has gotten to be a bit of an
> itch.  ;-)
> 
> def parseFile(inFile) :
>     import re
>     bSpace = re.compile("^ ")
>     multiSpace = re.compile(r"\s\s+")
>     nbsp = re.compile(r"&nbsp;")
>     HTMLRegEx =
>     
> re.compile(r"(&lt;|<)/?((!--.*--)|(STYLE.*STYLE)|(P|BR|b|STRONG))/?(&gt;|>)
> ",re.I)
> 
>     f = open(inFile,"r")
>     lines = f.readlines()
>     newLines = []
>     for line in lines :
>         line = HTMLRegEx.sub(' ',line)
>         line = bSpace.sub('',line)
>         line = nbsp.sub(' ',line)
>         line = multiSpace.sub(' ',line)
>         newLines.append(line)
>     f.close()
>     return newLines
> 
> Now, the main issue I'm looking at is with the multiSpace regex.  When
> applied, this removes some blank lines but not others.  I don't want
> it to remove any blank lines, just contiguous multiple spaces in a
> line.
> 
> BTB, this also illustrates a difference between python and perl -- in
> perl, i can change "line" and it automatically changes the entry in
> the array; this doesn't work in python.  A bit annoying, actually.
> ;-)
> 
> Thanks for any help.  If there's a better way to do this, I'm open to
> suggestions on that regard, too.
> 
> mp
> _______________________________________________
> Tutor maillist  -  [email protected]
> http://mail.python.org/mailman/listinfo/tutor
> 


-- 
'There is only one basic human right, and that is to do as you damn well please.
And with it comes the only basic human duty, to take the consequences.
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to