Hi, how can parse an HTML String. I need parse next Line : '<FIELD><NAME>BSCS status</NAME><TYPE>string</TYPE><VALUE>none</VALUE></FIELD><FIELD><NAME>TopCre_life</NAME><TYPE>integer</TYPE><VALUE>0</VALUE></FIELD>'
And this is the program:
==============================================
#!/usr/bin/env python
from sgmllib import SGMLParser
import urllib
import pdb
class ParserHTML(SGMLParser):
#pdb.set_trace()
def unknown_starttag(self, tag, attrs):
value = 0
startTAG = '<' + tag
for i in attrs:
if(i[0].lower() == i[1].lower() and not i[0] == i[1]):
startTAG = startTAG[:-1] + ' ' + str(i[1])
value = 1
else:
startTAG += ' ' + str(i[0]) + '="' + str(i[1]) + '"'
value = 0
if(value == 1): startTAG += '"'
startTAG += '>'
def handle_data(self, data):
#print data
detalle = []
detalle2 = []
a = ''
for pruebas in data:
#pruebas = data
detalle.extend(pruebas)
a = ''.join([a, pruebas])
detalle2.append(a)
print detalle2
return detalle2
def P_main(self, atr):
return p.feed(atr)
if __name__ == '__main__':
node = '<FIELD><NAME>BSCS
status</NAME><TYPE>string</TYPE><VALUE>none</VALUE></FIELD><FIELD><NAME>TopCre_life</NAME><TYPE>integer</TYPE><VALUE>0</VALUE></FIELD>'
p = ParserHTML()
dts = p.P_main(node)
==============================================
Result of program its:
bash-3.1$ ./pruebasDOM.py
['BSCS status']
['string']
['none']
['TopCre_life']
['integer']
['0']
I can't pass the data to one dict() or []. I need all values, ['BSCS
Status', 'string', 'none', 'TopCre_life', 'integer', '0']
That i can do?
Tanks and greetings.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ python-win32 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-win32
