[Tutor] Avoiding repetetive pattern match in re module

Intercodes Thu, 05 Jan 2006 02:41:54 -0800

Hello everyone,

    Iam new to this mailing list as well as python(uptime-3 weeks).Today I learnt about RE from http://www.amk.ca/python/howto/regex/.This one was really helpful. I started working out with few examples on my own. The first one was to collect all the HTML tags used in an HTML file. I wrote this code.

------------------------------
import re
file1=open(raw_input("\nEnter The path of the HTML file: "),"r")
ans=""
while 1:
    data="">    if data="">        break
    ans=ans+data

ans1=re.sub(r' .*?',">",ans) # to make tags such as <link rel..> to <link>rel
match=re.findall(r'<[^/]?[a-zA-Z]+.*?>',ans1)
print match
---------------------------------

I get the output but with tags repeated. I want to display all the tags used in a file ,but no repetitions.Say the output to one of the HTML file I got was : "<html><link> <a><br><a><br>"

Instead of writing a new 'for,if' loop to filter the repetetive tags from the list, is there something that I can add in the re itself to match the pattern only once?

Thank You
--
Intercodes

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Avoiding repetetive pattern match in re module

Reply via email to