[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2009-01-01 Thread Georg Brandl

Changes by Georg Brandl ge...@python.org:


--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread Tarek Ziadé

Changes by Tarek Ziadé ziade.ta...@gmail.com:


--
assignee:  - tarek
nosy: +tarek
priority:  - normal
type:  - crash

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread Tarek Ziadé

Tarek Ziadé ziade.ta...@gmail.com added the comment:

Here's a status:

The problem is located in the codec that decodes the data (called by the
compile builtin).

It throws an error :

*** UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in
position 853: character maps to undefined

Which is caught by compile and translated into:

SyntaxError: unknown encoding: cp1252

So I see two problems:

1/ why compile throws such an error when there's an UnicodeDecodeError
2/ why compile works well under Py2 since 0x9d is not part of the 
   cp1252 mapping

I have written a test that reproduces the problem, and I am still
investigating. If I can't find the problem I will ask for help on
python-dev because I have no knowledge in the compiler internals yet.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread John Machin

John Machin sjmac...@users.sourceforge.net added the comment:

TWO POINTS:
(1) I am not very concerned about chars like \x9d which are not valid in
the declared encoding; I am more concerned with chars like \x93 and \x94
which *ARE* valid in the declared encoding. Please ensure that these
cases are included in tests.
(2) Please check your test data and test results. I get different
results. I have created a file x9d.py by making the minimal changes to
x94.py. For me, this blows up on bytecompiling with *both* 3.0
(UnicodeDecodeError, as expected) and 2.x (Syntax Error unknown encoding
cp1252, wrong message) -- see below.

byte-compiling C:\python30\Lib\site-packages\x9d.py to x9d.pyc
Traceback (most recent call last):
  File setup.py, line 5, in module
py_modules = [foo3, bar3, x93, x94, x9d, xa0b7]
  File C:\python30\lib\distutils\core.py, line 149, in setup
dist.run_commands()
  File C:\python30\lib\distutils\dist.py, line 942, in run_commands
self.run_command(cmd)
  File C:\python30\lib\distutils\dist.py, line 962, in run_command
cmd_obj.run()
  File C:\python30\lib\distutils\command\install.py, line 571, in run
self.run_command(cmd_name)
  File C:\python30\lib\distutils\cmd.py, line 317, in run_command
self.distribution.run_command(command)
  File C:\python30\lib\distutils\dist.py, line 962, in run_command
cmd_obj.run()
  File C:\python30\lib\distutils\command\install_lib.py, line 91, in run
self.byte_compile(outfiles)
  File C:\python30\lib\distutils\command\install_lib.py, line 125, in
byte_compile
dry_run=self.dry_run)
  File C:\python30\lib\distutils\util.py, line 520, in byte_compile
compile(file, cfile, dfile)
  File C:\python30\lib\py_compile.py, line 137, in compile
codestring = f.read()
  File C:\python30\lib\io.py, line 1724, in read
decoder.decode(self.buffer.read(), final=True))
  File C:\python30\lib\io.py, line 1295, in decode
output = self.decoder.decode(input, final=final)
  File C:\python30\lib\encodings\cp1252.py, line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position
64: character maps to undefined

byte-compiling C:\python26\Lib\site-packages\x9d.py to x9d.pyc
SyntaxError: ('unknown encoding: cp1252',
('C:\\python26\\Lib\\site-packages\\x9d.py', 0, 0, None))

byte-compiling c:\python25\Lib\site-packages\x9d.py to x9d.pyc
  File c:\python25\Lib\site-packages\x9d.py, line 0
SyntaxError: ('unknown encoding: cp1252',
('c:\\python25\\Lib\\site-packages\\x9d.py', 0, 0, None))

Added file: http://bugs.python.org/file12492/x9d.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

On 2008-12-30 13:20, John Machin wrote:
 byte-compiling C:\python26\Lib\site-packages\x9d.py to x9d.pyc
 SyntaxError: ('unknown encoding: cp1252',
 ('C:\\python26\\Lib\\site-packages\\x9d.py', 0, 0, None))
 
 byte-compiling c:\python25\Lib\site-packages\x9d.py to x9d.pyc
   File c:\python25\Lib\site-packages\x9d.py, line 0
 SyntaxError: ('unknown encoding: cp1252',
 ('c:\\python25\\Lib\\site-packages\\x9d.py', 0, 0, None))
 
 Added file: http://bugs.python.org/file12492/x9d.py

FWIW, I've tried that file with Python 2.5 and 2.6 on my machine:

lemburg/tmp python2.5 ~/bin/pycompile.py x9d.py
 compiling x9d.py - x9d.pyc
   XXX type 'exceptions.SyntaxError': unknown encoding: cp1252 (x9d.py, line 
0)

lemburg/tmp python2.6 ~/bin/pycompile.py x9d.py
 compiling x9d.py - x9d.pyc
   XXX type 'exceptions.SyntaxError': unknown encoding: cp1252 (x9d.py, line 
0)

Note that the line number is wrong in both messages.

It is interesting that simply running the files gives a more correct
error message:

lemburg/tmp python2.5 x9d.py
  File x9d.py, line 2
SyntaxError: 'charmap' codec can't decode byte 0x9d in position 0: character
maps to undefined

lemburg/tmp python2.6 x9d.py
  File x9d.py, line 2
SyntaxError: 'charmap' codec can't decode byte 0x9d in position 0: character
maps to undefined

The character position is wrong again in both messages.

Needless to say that the encoding cp1252 is *not* unknown. It looks
like compile() causes the decoding error to be overwritten with a
misleading error message.

--
nosy: +lemburg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread Tarek Ziadé

Tarek Ziadé ziade.ta...@gmail.com added the comment:

yup, here's the test I have written to demonstrate the problem. In any
case, compile doesn't behave right way in the first place.

--
keywords: +patch
Added file: http://bugs.python.org/file12493/encoding.issue.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread Tarek Ziadé

Changes by Tarek Ziadé ziade.ta...@gmail.com:


Removed file: http://bugs.python.org/file12493/encoding.issue.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread Tarek Ziadé

Changes by Tarek Ziadé ziade.ta...@gmail.com:


Added file: http://bugs.python.org/file12494/encoding.issue.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread John Machin

John Machin sjmac...@users.sourceforge.net added the comment:

(1) what am I supposed to infer from Yup?? That all of that \x9d stuff
was a mistake?

(2)
+def tearDown(self):
+pyc_file = os.path.join(os.path.dirname(__file__), 'cp1252.pyc')
+if os.path.exists(pyc_file):
+os.patth.remove(pyc_file)

os.patth is novel :-)

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

This is a duplicate of issue4626.

Here, the content is correctly decoded with cp1252, then passed to
compile(); but compile() works on the internal utf-8 representation, and
tries to decode it again with cp1252!

Yes, the error message is overwritten. If I remove the code that sets
the unknown encoding exception, I get:

 compile(open(c:/temp/t1252.py, encoding=cp1252).read(),
t1252.py, exec)
SyntaxError: 'charmap' codec can't decode byte 0x9d in position 35:
character maps to undefined

The 0x9d explains easily:
 b\x94.decode('cp1252').encode('utf8')
b'\xe2\x80\x9d'

--
nosy: +amaury.forgeotdarc
superseder:  - compile() doesn't ignore the source encoding when a string is 
passed in

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread Tarek Ziadé

Changes by Tarek Ziadé ziade.ta...@gmail.com:


--
resolution:  - duplicate

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-24 Thread John Machin

New submission from John Machin sjmac...@users.sourceforge.net:

File foo3.py is [cut down (orig 87Kb)] output of 2to3 conversion tool
and (coincidentally) is still valid 2.x syntax. There are no syntax
errors reported by any of the following:
   \python26\python -c import foo3
   \python26\python foo3.py
   \python26\python setup.py install
   \python30\python -c import foo3
   \python30\python foo3.py
However 3.0 install
   \python30\python setup.py install
produces:

[snip]
running install_lib
copying build\lib\foo3.py - C:\python30\Lib\site-packages
byte-compiling C:\python30\Lib\site-packages\foo3.py to foo3.pyc
  File C:\python30\Lib\site-packages\foo3.py, line 0
### Note also line 0 above ###
SyntaxError: unknown encoding: cp1252

Same happens if alternative name windows-1252 is used instead of cp1252.

NOTE: file foo3.py actually does have some non-ASCII characters (\xa0,
\x93, \x94), in comments. Another file (bar3.py) from the same package
contains \xb7 twice, but doesn't have the unknown encoding problem.
There are several other files in the same package that start with # -*-
coding: windows-1252 -*- (or cp1252, or even cp1251(!)) but have no
non-ASCII characters in them. They don't get this incorrect error
message either.

--
components: Distutils
files: py3encbug.zip
messages: 78273
nosy: sjmachin
severity: normal
status: open
title: 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252
versions: Python 3.0
Added file: http://bugs.python.org/file12445/py3encbug.zip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-24 Thread John Machin

John Machin sjmac...@users.sourceforge.net added the comment:

A clue:

 print(ascii(b'\xa0\x93\x94\xb7'.decode('cp1252')))
'\xa0\u201c\u201d\xb7'

Could be that it only happens where there's a cp1252 character that's
not in latin1; see files x93.py and x94.py (have problem) and xa0b7.py
(doesn't have problem).

Added file: http://bugs.python.org/file12446/py3encbug2.zip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-24 Thread John Machin

Changes by John Machin sjmac...@users.sourceforge.net:


Removed file: http://bugs.python.org/file12445/py3encbug.zip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com