Perhaps replace: lines=soup.get_text() file.write(lines) ...with something like: text = soup.get_text() lines = text.split('\n') for line in lines: if line.strip(): file.write('%s\n' % (line, ))
(untested) On Tue, Feb 27, 2018 at 2:50 AM, <jenswaelk...@gmail.com> wrote: > Dear all, > I try to get the numerical data from the following webpage: > http://www.astro.oma.be/GENERAL/INFO/nzon/zon_2018.html > > With the following code-fragment I was already able to get a partial result: > > #!/usr/bin/env python > #memo: install bs4 as follows: sudo easy_install bs4 > # -*- coding: utf-8 -*- > #3 lines below necessary to avoid encoding problem > import sys > reload(sys) > sys.setdefaultencoding('utf8') > import urllib2 > file = open("testfile.txt","w") > source = "http://www.astro.oma.be/GENERAL/INFO/nzon/zon_2018.html" > page = urllib2.urlopen(source) > from bs4 import BeautifulSoup > soup = BeautifulSoup(page,'lxml') > lines=soup.get_text() > file.write(lines) > file.close() > > I tried to delete the empty lines but I am totally stuck at this moment, can > anyone help me further? > > thanks in advance > jens > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list