Tommy Kaas wrote: > Im trying to learn basic web scraping and starting from scratch. Im > using Activepython 2.6.6
> I have uploaded a simple table on my web page and try to scrape it and > will save the result in a text file. I will separate the columns in the > file with > #. > It works fine but besides # I also get spaces between the columns in the > text file. How do I avoid that? > This is the script: > import urllib2 > from BeautifulSoup import BeautifulSoup > f = open('tabeltest.txt', 'w') > soup = BeautifulSoup(urllib2.urlopen('http://www.kaasogmulvad.dk/unv/python/tabelte > st.htm').read()) > rows = soup.findAll('tr') > for tr in rows: > cols = tr.findAll('td') > print >> f, > cols[0].string,'#',cols[1].string,'#',cols[2].string,'#',cols[3].string > > f.close() > And the text file looks like this: > Kommunenr # Kommune # Region # Regionsnr > 101 # København # Hovedstaden # 1084 > 147 # Frederiksberg # Hovedstaden # 1084 > 151 # Ballerup # Hovedstaden # 1084 > 153 # Brøndby # Hovedstaden # 1084 The print statement automatically inserts spaces, so you can either resort to the write method for i in range(4): if i: f.write("#") f.write(cols[i].string) which is a bit clumsy, or you build the complete line and then print it as a whole: print >> f, "#".join(col.string for col in cols) Note that you have non-ascii characters in your data -- I'm surprised that writing to a file works for you. I would expect that import codecs f = codecs.open("tmp.txt", "w", encoding="utf-8") is needed to successfully write your data to a file Peter _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor