hello i have that string "<html>hello</a>world<anytag>ok" and i want to extract all the text , without html tags , the result should be some thing like that : helloworldok
i have tried that : from re import findall chaine = """<html>hello</a>world<anytag>ok""" print findall('[a-zA-z][^(<.*>)].+?[a-zA-Z]',chaine) >>> ['html', 'hell', 'worl', 'anyt', 'ag>o'] the result is not correct ! what would be the correct regex to use ? -- http://mail.python.org/mailman/listinfo/python-list