[issue5419] urllib.request.open(someURL).read() returns a bytes object so writing it requires binary mode

2010-04-15 Thread Daniel Haertle

Daniel Haertle haer...@uni-bonn.de added the comment:

I got struck by the same feature. In addition, currently the docs are wrong in 
the examples (at 
http://docs.python.org/dev/py3k/library/urllib.request.html#examples the output 
of f.read() is a string instead of bytes). There I propose the change from 

 import urllib.request
 f = urllib.request.urlopen('http://www.python.org/')
 print(f.read(100))
!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
?xml-stylesheet href=./css/ht2html

to

 import urllib.request
 f = urllib.request.urlopen('http://www.python.org/')
 print(f.read(100).decode('utf-8'))
!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN 
http://www.w3.org/TR/xhtml1/DTD/xhtm

The other examples need to be corrected in a similar way.
Even more importantly, the HOWTO Fetch Internet Resources Using The urllib 
Package needs to be corrected too.

In the documentation of urllib.request.urlopen I propose to add a sentence 
(after the paragraph This function returns a file-like object...) explaining 
that reading the object returns bytes that need to be decoded to a string:
Note that the method read() returns bytes that need to be decoded to a string 
using decode().

--
nosy: +Danh
versions: +Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5419
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5419] urllib.request.open(someURL).read() returns a bytes object so writing it requires binary mode

2010-04-15 Thread Senthil Kumaran

Changes by Senthil Kumaran orsent...@gmail.com:


--
assignee: georg.brandl - orsenthil
nosy: +orsenthil
resolution:  - accepted

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5419
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5419] urllib.request.open(someURL).read() returns a bytes object so writing it requires binary mode

2010-04-15 Thread Senthil Kumaran

Senthil Kumaran orsent...@gmail.com added the comment:

Yeah, there a example in the tutorial that was changed recently along similar 
lines suggested. 
(http://docs.python.org/dev/py3k/tutorial/stdlib.html#internet-access)
The other examples got to be changed too.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5419
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5419] urllib.request.open(someURL).read() returns a bytes object so writing it requires binary mode

2010-04-15 Thread Senthil Kumaran

Senthil Kumaran orsent...@gmail.com added the comment:

Fixed in revision 80092 and merged into release31-maint in revision 80093. I am 
marking this as fixed and closed. If there are any similar issues at other 
places, we will address them as separate bugs.

--
resolution: accepted - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5419
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5419] urllib.request.open(someURL).read() returns a bytes object so writing it requires binary mode

2009-04-22 Thread Daniel Diniz

Changes by Daniel Diniz aja...@gmail.com:


--
keywords: +easy
priority:  - normal
stage:  - needs patch
type:  - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5419
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5419] urllib.request.open(someURL).read() returns a bytes object so writing it requires binary mode

2009-03-04 Thread Mitchell Model

New submission from Mitchell Model m...@acm.org:

There needs to be something somewhere in the documentation that makes 
the simple point that data coming in from the web is bytes not strings, 
which is a big change from Python 2, and that it needs to be manipulated 
as such, including writing in binary mode.

I am not sure what documentation should be changed, but I do think 
something is missing, because I just ran around in circles on this one 
for quite some time. Perhaps the Unicode HOWTO needs more information; 
possibly urllib.request does; maybe a combination of things have to be 
added to several documentation files. Here's what happened:

I wanted to read from a web page, make some string replacements, and 
save to a file, so I wrote code that boils down to something like:

with open('url.html', 'w') as fil:
fil.write(urllib.request.open(aURL).read()).replace(str1, str2)

The first thing that happened was an error telling me that I can't write 
bytes to a text stream, so I realized that read() was returning a bytes 
object, which makes sense.

So I converted it to a string, but that put a b' at the beginning of the 
file and a ' at the end! Bad.

Instead of str(thebytes) I did the proper thing: thebytes.decode(), and 
wrote that to the file.

But then I found that Non-ASCII characters created problems -- they were 
saved in the file as \xNN\xNN or even three \x's, then displayed as 
garbage when the page was opened in a browser. 

So I tried decoding using different codecs but couldn't find one that 
worked for the é and the emdash that were in the response.

Finally I realized that the whole thing was a delusion: obviously 
urlopen responses have to return bytes objects, and adding 'b' to the 
'w' when opening the output file fixed everything. (I also had to change 
my replacement strings to bytes.)

I went back to the relevant documentation multiple times, including 
after I figured everything out, and I can't convince myself that it 
makes the connection anywhere between bytes coming in, manipulating the 
bytes as bytes, and writing out in binary. Yes, in retrospect this all 
makes sense and perhaps even should have been obvious, but I am quite 
sure I won't be the only experienced Python 2 programmer to trip over 
this when making the transition to Python 3.

I apologize in advance if the requested documentation exists and I 
didn't find it, in which case I would appreciate a pointer to where it 
is lies.

--
assignee: georg.brandl
components: Documentation
messages: 83179
nosy: MLModel, georg.brandl
severity: normal
status: open
title: urllib.request.open(someURL).read() returns a  bytes object so writing 
it requires binary mode
versions: Python 3.0, Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5419
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com