Tommy Kaas wrote: > Steven D'Aprano wrote: >> But in your case, the best way is not to use print at all. You are >> writing > to a >> file -- write to the file directly, don't mess about with print. >> Untested: >> >> >> f = open('tabeltest.txt', 'w') >> url = 'http://www.kaasogmulvad.dk/unv/python/tabeltest.htm' >> soup = BeautifulSoup(urllib2.urlopen(url).read()) >> rows = soup.findAll('tr') >> for tr in rows: >> cols = tr.findAll('td') >> output = "#".join(cols[i].string for i in (0, 1, 2, 3)) >> f.write(output + '\n') # don't forget the newline after each row >> f.close() > > Steven, thanks for the advice. > I see the point. But now I have problems with the Danish characters. I get > this: > > Traceback (most recent call last): > File "C:/pythonlib/kursus/kommuner-regioner_ny.py", line 36, in <module> > f.write(output + '\n') # don't forget the newline after each row > UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in > position 5: ordinal not in range(128) > > I have tried to add # -*- coding: utf-8 -*- to the top of the script, but > It doesn't help?
The coding cookie only affects unicode string constants in the source code, it doesn't change how the unicode data coming from BeautifulSoup is handled. As I suspected in my other post you have to convert your data to a specific encoding (I use UTF-8 below) before you can write it to a file: import urllib2 import codecs from BeautifulSoup import BeautifulSoup html = urllib2.urlopen( 'http://www.kaasogmulvad.dk/unv/python/tabeltest.htm').read() soup = BeautifulSoup(html) with codecs.open('tabeltest.txt', "w", encoding="utf-8") as f: rows = soup.findAll('tr') for tr in rows: cols = tr.findAll('td') print >> f, "#".join(col.string for col in cols) The with statement implicitly closes the file, so you can avoid f.close() at the end of the script. Peter _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor