I am trying to get csv-output from a html-file. With this code I had a little success: ========================= from BeautifulSoup import BeautifulSoup from string import replace, join import re
f = open("configuration.html","r") g = open("configuration.csv",'w') soup = BeautifulSoup(f) t = soup.findAll('table') for table in t: rows = table.findAll('tr') for th in rows[0]: t = th.find(text=True) g.write(t) g.write(',') # print(','.join(t)) for tr in rows: cols = tr.findAll('td') for td in cols: try: t = td.find(text=True).replace(' ','') g.write(t) except: g.write ('') g.write(",") g.write("\n") =============================== producing output like this: RULE,SOURCE,DESTINATION,SERVICES,ACTION,TRACK,TIME,INSTALL ON,COMMENTS, 1,,,,drop,Log,Any,,, 2,All us...@any,,Any,clientencrypt,Log,Any,,, 3,Any,Any,,drop,None,Any,,, 4,,,,drop,None,Any,,, ... It left out all the non-plaintext parts of <td></td> I then tried using t.renderContents and then got something like this (one line broken into many for the sake of this email): 1,<img src=icons/group.png> <a href=#OBJ_sunetint> sunetint</A><BR>, <img src=icons/gateway_cluster.png> <a>href=#OBJ_Rainwall_Cluster >Rainwall_Cluster</A> <BR>, <img>src=icons/udp.png> <a href=#SVC_IKE >IKE</a><br>, <img src=icons/drop.png> drop, <img src=icons/log.png> Log , <img src=icons/any.png> Any<br> , <img src=icons/gateway_cluster.png> <a href=#OBJ_Rainwall_Cluster >Rainwall_Cluster</A> <BR> , How do I get Beautifulsoup to render (taking the above line as example) sunentint for <img src=icons/group.png> <a href=#OBJ_sunetint>sunetint</A><BR> and still provide the text-parts in the <td>'s with plain text? I have experimented a little bit with regular expressions, but could so far not find a solution. Regards Johann -- Johann Spies Telefoon: 021-808 4599 Informasietegnologie, Universiteit van Stellenbosch "Lo, children are an heritage of the LORD: and the fruit of the womb is his reward." Psalms 127:3 -- http://mail.python.org/mailman/listinfo/python-list