Re: Clean "Durty" strings

rzed Mon, 02 Apr 2007 16:21:10 -0700

"Diez B. Roggisch" <[EMAIL PROTECTED]> wrote in
news:[EMAIL PROTECTED]:


>> 
>> If the OP is constrained to standard libraries, then it may be
>> a question of defining what should be done more clearly. The
>> extraneous spaces can be removed by tokenizing the string and
>> rejoining the tokens. Replacing portions of a string with
>> equivalents is standard stuff. It might be preferable to create
>> a function that will accept lists of from and to strings and
>> translate the entire string by successively applying the
>> replacements. From what I've seen so far, that would be all the
>> OP needs for this task. It might take a half- dozen lines of
>> code, plus the from/to table definition. 
> 
> The OP had <br>-tags in his text. Which is _more_ than a half
> dozen lines of code to clean up. Because your simple
> replacement-approach won't help here: 
> 
> <br>foo <br> bar </br>
> 
> Which is perfectly legal HTML, but nasty to parse.

Well, as I said, given the input the OP supplied, it's not even 
necessary to parse it. It isn't clear what the true desired 
operation is, but this seems to meet the criteria given:

<code -- the string 's' is wrapped nastily, but ...>
s ="""\
bonne mentalit&eacute; mec!:) \n                        <br>bon 
pour
info moi je suis un serial posteur arceleur dictateur ^^*
\n                        <br>mais pour avoir des resultats 
probant il
faut pas faire les mariolles, comme le &quot;fondateur&quot; de 
bvs
krew \n
mais pour avoir des resultats probant il faut pas faire les 
mariolles,
comme le &quot;fondateur&quot; de bvs krew \n"""

fromlist = ['<br>', '&eacute;', '&quot;']
tolist   = ['',     'é', '"' ]


def withReplacements( s, flist,tlist ):
    for ix, f in enumerate(flist):
        t = tlist[ix]
        s = s.replace( f,t )
    return s    

print withReplacements(' '.join(s.split()),fromlist,tolist)

</code>

If the question is about efficiency or robustness or generality, 
then that's another set of issues, but that's for the 1.1 version 
to handle. 

-- 
rzed

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Clean "Durty" strings

Reply via email to