Hi! Give a look at http://www.crummy.com/software/BeautifulSoup/
BeautifulSoup is a python module designed for parsing html Carlo what is ITER? www.iter.org >> >> First, excuse me my English... English is not my native >>language, but >> I hope >> that I will be able to describe my problem. >> >> I am new in python for web, but I want to do such thing: >> >> Suppose I have a html-page, like this: >> """ >> <title>TITLE</title> >> <body> >> body_1 >> <h1>1_1</h1> >> <h2>2_1</h2> >> <div id=one>div_one_1</div> >> <p>p_1</p> >> <p>p_2</p> >> <div id=one>div_one_2</div> >> <span class=sp_1> >> sp_text >> <div id=one>div_one_2</div> >> <div id=one>div_one_3</div> >> </span> >> <h3>3_1</h3> >> <h2>2_2</h2> >> <p>p_3</p> >> body_2 >> <h1>END</h1> >> <table> >> <tr><td>td_1</td> >> <td class=sp_2>td_2</td> >> <td>td_3</td> >> <td>td_4</td></tr> >> ... >> </body> >> >> """ >> >> I want to get all info from this html in a dictionary that >>looks like >> this: >> >> rezult = [{'title':['TITLE'], >> {'body':['body_1', 'body_2']}, >> {'h1':['1_1', 'END']}, >> {'h2':['2_1', '2_2']}, >> {'h3':['3_1']}, >> {'p':['p_1', 'p_2']}, >> {'id_one':['div_one_1', 'div_one_2', 'div_one_3']}, >> {'span_sp_1':['sp_text']}, >> {'td':['td_1', 'td_3', 'td_4']}, >> {'td_sp_2':['td_2']}, >> .... >> ] >> >> Huh, hope you understand what I need. >> Can you advise me what approaches exist to solve tasks of such >>type... >> and >> may be show some practical examples.... >> Thanks in advance for help of all kind... >> >> >> >> Try ElementTree or Amara. >> http://effbot.org/zone/element-index.htm >> http://uche.ogbuji.net/tech/4suite/amara/ >> >> If you only cared about contents, BeautifulSoup is the answer. >> >> Ismael >> _______________________________________________ >> Tutor maillist - Tutor@python.org >> http://mail.python.org/mailman/listinfo/tutor >> _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor