need help with re module

linuxprog Wed, 20 Jun 2007 09:59:23 -0700

hello

i have that string "<html>hello</a>world<anytag>ok" and i want to 
extract all the text , without html tags , the result should be some 
thing like that : helloworldok


i have tried that :

        from re import findall

        chaine = """<html>hello</a>world<anytag>ok"""

        print findall('[a-zA-z][^(<.*>)].+?[a-zA-Z]',chaine)
      
       >>> ['html', 'hell', 'worl', 'anyt', 'ag>o']

the result is not correct ! what would be the correct regex to use ?



-- 
http://mail.python.org/mailman/listinfo/python-list

need help with re module

Reply via email to