Hi!

Give a look at http://www.crummy.com/software/BeautifulSoup/

BeautifulSoup is a python module designed for parsing html

Carlo
 
what is ITER? www.iter.org
 
 
>>
>>              First, excuse me my English... English is not my native
>>language, but
>>              I hope
>>              that I will be able to describe my problem.
>>
>>              I am new in python for web, but I want to do such thing:
>>
>>              Suppose I have a html-page, like this:
>>              """
>>              <title>TITLE</title>
>>              <body>
>>              body_1
>>              <h1>1_1</h1>
>>              <h2>2_1</h2>
>>              <div id=one>div_one_1</div>
>>              <p>p_1</p>
>>              <p>p_2</p>
>>              <div id=one>div_one_2</div>
>>              <span class=sp_1>
>>              sp_text
>>              <div id=one>div_one_2</div>
>>              <div id=one>div_one_3</div>
>>              </span>
>>              <h3>3_1</h3>
>>              <h2>2_2</h2>
>>              <p>p_3</p>
>>              body_2
>>              <h1>END</h1>
>>              <table>
>>              <tr><td>td_1</td>
>>              <td class=sp_2>td_2</td>
>>              <td>td_3</td>
>>              <td>td_4</td></tr>
>>              ...
>>              </body>
>>
>>              """
>>
>>              I want to get all info from this html in a dictionary
that
>>looks like
>>              this:
>>
>>              rezult = [{'title':['TITLE'],
>>              {'body':['body_1', 'body_2']},
>>              {'h1':['1_1', 'END']},
>>              {'h2':['2_1', '2_2']},
>>              {'h3':['3_1']},
>>              {'p':['p_1', 'p_2']},
>>              {'id_one':['div_one_1', 'div_one_2', 'div_one_3']},
>>              {'span_sp_1':['sp_text']},
>>              {'td':['td_1', 'td_3', 'td_4']},
>>              {'td_sp_2':['td_2']},
>>              ....
>>              ]
>>
>>              Huh, hope you understand what I need.
>>              Can you advise me what approaches exist to solve tasks
of such
>>type...
>>              and
>>              may be show some practical examples....
>>              Thanks in advance for help of all kind...
>>
>>
>>
>>      Try ElementTree or Amara.
>>      http://effbot.org/zone/element-index.htm
>>      http://uche.ogbuji.net/tech/4suite/amara/
>>
>>      If you only cared about contents, BeautifulSoup is the answer.
>>
>>      Ismael
>>      _______________________________________________
>>      Tutor maillist  -  Tutor@python.org
>>      http://mail.python.org/mailman/listinfo/tutor
>>

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to