[issue20409] .readline() returned garble text

2014-01-27 Thread Xiaoqing Rong

New submission from Xiaoqing Rong:

I'm using Windows 8. I created file 'weird1.txt' (attached) from an Excel 
worksheet using save as Unicode Text (*.txt). And this happened when I used 
Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:19:30) [MSC v.1600 64 bit 
(AMD64)] on win32:

 handle = open('weird1.txt'); handle.readline()
'ÿþ\x00P\x006\x004\x00;\x00Y\x00A\x00L\x000\x000\x001\x00C\x00;\x00T\x00F\x00C\x003\x00;\x00
 
\x00S\x00G\x00D\x00I\x00D\x00:\x00S\x000\x000\x000\x000\x000\x000\x000\x000\x001\x00,\x00
 \x00C\x00h\x00r\x00 \x00I\x00 \x00f\x00r\x00o\x00m\x00 
\x001\x005\x001\x000\x000\x006\x00-\x001\x004\x007\x005\x009\x004\x00,\x001\x005\x001\x001\x006\x006\x00-\x001\x005\x001\x000\x009\x007\x00,\x00
 \x00r\x00e\x00v\x00e\x00r\x00s\x00e\x00 
\x00c\x00o\x00m\x00p\x00l\x00e\x00m\x00e\x00n\x00t\x00,\x00 
\x00V\x00e\x00r\x00i\x00f\x00i\x00e\x00d\x00 \x00O\x00R\x00F\x00,\x00 
\x00\x00L\x00a\x00r\x00g\x00e\x00s\x00t\x00 \x00o\x00f\x00 \x00s\x00i\x00x\x00 
\x00s\x00u\x00b\x00u\x00n\x00i\x00t\x00s\x00 \x00o\x00f\x00 \x00t\x00h\x00e\x00 
\x00R\x00N\x00A\x00 \x00p\x00o\x00l\x00y\x00m\x00e\x00r\x00a\x00s\x00e\x00 
\x00I\x00I\x00I\x00 
\x00t\x00r\x00a\x00n\x00s\x00c\x00r\x00i\x00p\x00t\x00i\x00o\x00n\x00 
\x00i\x00n\x00i\x00t\x00i\x00a\x00t\x00i\x00o\x00n\x00 
\x00f\x00a\x00c\x00t\x00o\x00r\x00 \x00c\x00o\x00m\x00p\x00l\x00e\x00x\x00 
\x00(\x00T\x00F\x00I\x00I\x00I\x00C\x00)\x00;\x00 \x00p\x00a\x00r\x00t\x00 
\x00o\x00f\x00 \x00t\x00h\x00e\x00 \x00T\x00a\x00u\x00B\x00 
\x00d\x00o\x00m\x00a\x00i\x00n\x00 \x00o\x00f\x00 
\x00T\x00F\x00I\x00I\x00I\x00C\x00 \x00t\x00h\x00a\x00t\x00 
\x00b\x00i\x00n\x00d\x00s\x00 \x00D\x00N\x00A\x00 \x00a\x00t\x00 
\x00t\x00h\x00e\x00 \x00B\x00o\x00x\x00B\x00 
\x00p\x00r\x00o\x00m\x00o\x00t\x00e\x00r\x00 \x00s\x00i\x00t\x00e\x00s\x00 
\x00o\x00f\x00 \x00t\x00R\x00N\x00A\x00 \x00a\x00n\x00d\x00 
\x00s\x00i\x00m\x00i\x00l\x00a\x00r\x00 \x00g\x00e\x00n\x00e\x00s\x00;\x00 
\x00c\x00o\x00o\x00p\x00e\x00\n'

Then I opened 'weird1.txt' in Notepad++ 6.5.2, created file 'weird2.txt' by 
copying the whole content of 'weird1.txt' into a new file and saved it in 
Notepad++ 6.5.2 (I wanted to attach 'weird2.txt' but only one attachment is 
allowed), and this happened:

 handle = open('weird2.txt'); handle.readline()
'P64;YAL001C;TFC3; SGDID:S1, Chr I from 151006-147594,151166-151097, 
reverse complement, Verified ORF, Largest of six subunits of the RNA 
polymerase III transcription initiation factor complex (TFIIIC); part of the 
TauB domain of TFIIIC that binds DNA at the BoxB promoter sites of tRNA and 
similar genes; coope\n'

I can't see any difference between the contents of 'weird1.txt' and 
'weird2.txt' using Notepad++ or the Windows Notepad. Maybe some experts could 
tell me what's going on here?

--
components: IDLE
files: weird1.txt
messages: 209452
nosy: m123orning
priority: normal
severity: normal
status: open
title: .readline() returned garble text
type: behavior
versions: Python 3.3
Added file: http://bugs.python.org/file33750/weird1.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20409
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20409] .readline() returned garble text

2014-01-27 Thread R. David Murray

R. David Murray added the comment:

The file use different encodings.  In the first case, the first two bytes 
(which don't appear in the second example) I believe are the BOM.  I'm not an 
expert, but I believe it is a utf-16 file (thus all the \x00 bytes).  The 
second file is presumably utf-8, with no BOM.  Notepad++ handles both 
automatically.  For Python, you have to tell it to look for the BOM by 
specifying the appropriate codec in the open call.  This is because Python's 
philosophy is to not guess at the encoding of files (though it does have a 
default encoding, usually utf-8).

Questions like this are better directed to the python-list mailing list, by the 
way.

--
nosy: +r.david.murray
resolution:  - invalid
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20409
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com