Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-08 Thread Nobody
On Wed, 05 Oct 2011 21:39:17 -0700, Greg wrote: Here is the final code for those who are struggling with similar problems: ## open and decode file # In this case, the encoding comes from the charset argument in a meta tag # e.g. meta charset=iso-8859-2 fileObj = open(filePath,r).read()

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread Ulrich Eckhardt
Am 06.10.2011 05:40, schrieb Steven D'Aprano: (4) Do all your processing in Unicode, not bytes. (5) Encode the text into bytes using UTF-8 encoding. (6) Write the bytes to a file. Just wondering, why do you split the latter two parts? I would have used codecs.open() to open the file and

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread Chris Angelico
On Thu, Oct 6, 2011 at 8:29 PM, Ulrich Eckhardt ulrich.eckha...@dominalaser.com wrote: Just wondering, why do you split the latter two parts? I would have used codecs.open() to open the file and define the encoding in a single step. Is there a downside to this approach? Those two steps still

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread jmfauth
On 6 oct, 06:39, Greg gregor.hochsch...@googlemail.com wrote: Brilliant! It worked. Thanks! Here is the final code for those who are struggling with similar problems: ## open and decode file # In this case, the encoding comes from the charset argument in a meta tag # e.g. meta

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread xDog Walker
On Thursday 2011 October 06 10:41, jmfauth wrote: or  (Python2/Python3) import io with io.open('abc.txt', 'r', encoding='iso-8859-2') as f: ...     r = f.read() ... repr(r) u'a\nb\nc\n' with io.open('def.txt', 'w', encoding='utf-8-sig') as f: ...     t = f.write(r) ...

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread John Gordon
In mailman.1785.1317928997.27778.python-l...@python.org xDog Walker thud...@gmail.com writes: What is this io of which you speak? It was introduced in Python 2.6. -- John Gordon A is for Amy, who fell down the stairs gor...@panix.com B is for Basil, assaulted

encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-05 Thread Greg
Hi, I am having some encoding problems when I first parse stuff from a non-english website using BeautifulSoup and then write the results to a txt file. I have the text both as a normal (text) and as a unicode string (utext): print repr(text) 'Branie zak\xc2\xb3adnik\xc3\xb3w' print repr(utext)

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-05 Thread Steven D'Aprano
On Wed, 05 Oct 2011 16:35:59 -0700, Greg wrote: Hi, I am having some encoding problems when I first parse stuff from a non-english website using BeautifulSoup and then write the results to a txt file. If you haven't already read this, you should do so:

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-05 Thread Greg
Brilliant! It worked. Thanks! Here is the final code for those who are struggling with similar problems: ## open and decode file # In this case, the encoding comes from the charset argument in a meta tag # e.g. meta charset=iso-8859-2 fileObj = open(filePath,r).read() fileContent =

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-05 Thread Chris Angelico
On Thu, Oct 6, 2011 at 3:39 PM, Greg gregor.hochsch...@googlemail.com wrote: Brilliant! It worked. Thanks! Here is the final code for those who are struggling with similar problems: ## open and decode file # In this case, the encoding comes from the charset argument in a meta tag # e.g.