[issue10329] trace.py and unicode in Python 3

2013-08-04 Thread Alexander Belopolsky

Changes by Alexander Belopolsky alexander.belopol...@gmail.com:


--
stage: test needed - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-09 Thread Walter Dörwald

Walter Dörwald wal...@livinglogic.de added the comment:

 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 ... it complicates handling of the output of trace.py. 
 For each file you have to do the encoding detection dance again ...
 
 What? You just have to call one function! tokenize.open() :-) Well, ok, 
 it's not commited yet, but it looks like most people agree: #10335.

The problem is that the script that downloads and builds the Python
source and generates the HTML for http://coverage.livinglogic.de/ isn't
ported to Python 3 yet (and can't be ported easily). However *running*
the test suite of course uses the current Python checkout, so an option
that lets me specify which encoding trace.py/regrtest.py should output
would be helpful.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-08 Thread Walter Dörwald

Walter Dörwald wal...@livinglogic.de added the comment:

Using the original encoding of the Python source file might be the politically 
correct thing to do, but it complicates handling of the output of trace.py. For 
each file you have to do the encoding detection dance again. It would be great 
if I could specify which encoding trace.py use (with the files encoding being 
the default).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-08 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 ... it complicates handling of the output of trace.py. 
 For each file you have to do the encoding detection dance again ...

What? You just have to call one function! tokenize.open() :-) Well, ok, it's 
not commited yet, but it looks like most people agree: #10335.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-07 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

For the record, the test failure can reproduced by the following:

$ LANG=C ./python -m test.regrtest test_imp test_trace
[1/2] test_imp
[2/2] test_trace
/home/antoine/py3k/__svn__/Lib/unittest/case.py:402: ResourceWarning: unclosed 
file _io.TextIOWrapper name='@test_11986_tmp/os.cover' 
encoding='ANSI_X3.4-1968'
  result.addError(self, sys.exc_info())
test test_trace failed -- Traceback (most recent call last):
  File /home/antoine/py3k/__svn__/Lib/test/test_trace.py, line 296, in 
test_coverage
self._coverage(tracer)
  File /home/antoine/py3k/__svn__/Lib/test/test_trace.py, line 291, in 
_coverage
r.write_results(show_missing=True, summary=True, coverdir=TESTFN)
  File /home/antoine/py3k/__svn__/Lib/trace.py, line 334, in write_results
lnotab, count)
  File /home/antoine/py3k/__svn__/Lib/trace.py, line 384, in 
write_results_file
outfile.write(line.expandtabs(8))
UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 5: 
ordinal not in range(128)

1 test OK.
1 test failed:
test_trace


There's a strange interaction between test_imp and test_trace, it seems. Not 
sure why.

--
stage: unit test needed - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-07 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

$ LANG=C ./python -m test.regrtest test_imp test_trace
[1/2] test_imp
[2/2] test_trace
...
UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 5: 
ordinal not in range(128)

issue10329.diff fixes this failure. The failure comes from a nonbreaking space 
introduced by myself by error in Lib/os.py, which is the only non-ASCII 
character in this file. r86302 removes it.

I commited issue10329.diff to Python 3.2 as r86303: thanks Alex ;-)

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-07 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

Reopening as a reminder to add a unit test for this case.

--
stage: needs patch - unit test needed
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-06 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 1. It opens the source file one more time.  This is probably acceptable
 because existing code already opens it at least four times when -m (show
 missing) option is selected.  (Twice in find_executable_linenos() and
 twice in linecache.getlines().  Fixing that would require refactoring of
 linecache code.

Create a function like linecache.getencoding() seems to be overkill.

I created issue #10335 to add a function tokenize.open_python(): open a Python 
script in read mode without opening the file twice and get the encoding with 
detect_encoding(). This issue is more generic than trying to optimize the 
trace module.

 2. This will not work for source code not stored in a file, but provided by
 a __loader__.get_source() method.  However it looks like trace will not
 work at all in this case, so fixing that is a separate issue.

For this case, I think that we can add a try/except IOError with a fallback to 
encoding = 'utf-8'.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-05 Thread Walter Dörwald

New submission from Walter Dörwald wal...@livinglogic.de:

It seems that on Python 3 (i.e. the py3k branch) trace.py can not handle source 
that includes Unicode characters. Running the test suite with code coverage 
info via

   ./python Lib/test/regrtest.py -T -N -uurlfetch,largefile,network,decimal

sometimes fails with the following exception:

Traceback (most recent call last):
  File Lib/test/regrtest.py, line 1500, in module
main()
  File Lib/test/regrtest.py, line 696, in main
r.write_results(show_missing=True, summary=True, coverdir=coverdir)
  File /home/coverage/python/Lib/trace.py, line 319, in write_results
lnotab, count)
  File /home/coverage/python/Lib/trace.py, line 369, in write_results_file
outfile.write(line.expandtabs(8))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in
position 30: ordinal not in range(128)

The script that produces code coverage info on http://coverage.livinglogic.de/ 
uses this feature to generate code coverage info.

Applying the attached patch (i.e. specifying an explicit encoding when opening 
the output file) fixes the problem.

--
files: trace.diff
keywords: patch
messages: 120506
nosy: doerwalter, haypo
priority: normal
severity: normal
status: open
title: trace.py and unicode in Python 3
Added file: http://bugs.python.org/file19505/trace.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-05 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
components: +Library (Lib)
nosy: +belopolsky
stage:  - patch review
type:  - behavior
versions: +Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com




[issue10329] trace.py and unicode in Python 3

2010-11-05 Thread Nick Coghlan

Changes by Nick Coghlan ncogh...@gmail.com:


--
nosy: +ncoghlan

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-05 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

I don't think trace.diff is proposed for commit.  I see it more as a supporting 
file for diagnosing the problem.

I see two problems here:

1. Apparently OP's system opens files with encoding set to 'ascii' by default. 
This is not the case on any of the systems I have access to (OSX and Linux).  I 
will try to reproduce this issue by setting LANG=en_US.ascii.

2. Regrtest attempts to write a no-ascii character into the trace results file. 
 I suspect this comes from test cases that test import from modules with 
non-ascii name or with non-ascii identifiers.

I am not sure there is anything we need to change here other than possibly skip 
tests that use non-ascii identifiers of the systems with default encoding set 
to ascii.  I would be +0 on adding errors='replace' or 'backshlashreplace' to 
the open() call in  write_results_file(), but hardcoding encoding=utf-8 is 
definitely not the right thing to do.

--
stage: patch review - unit test needed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-05 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 I would be +0 on adding errors='replace' or 'backshlashreplace' to the 
 open() call in  write_results_file(), but hardcoding encoding=utf-8
 is definitely not the right thing to do.

Who are the consumers of the trace files? Is there a formal specification or is 
Python the primary consumer?
If the former, then follow the specification (and/or amend it ;-)).
If the latter, you have the right to be creative; then utf-8 with the sounds 
like a most reasonable choice (possibly with an error handler such as ignore 
or replace to avoid barfing on lone surrogates).

Relying on the default encoding is not really a good idea, though. This is good 
for quick scripts or in the rare cases where it is by definition the expected 
behaviour. But in more elaborate cases you certainly want to decide the 
encoding by yourself.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-05 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

On Fri, Nov 5, 2010 at 2:43 PM, Antoine Pitrou rep...@bugs.python.org wrote:
..
 Who are the consumers of the trace files? Is there a formal specification
 or is Python the primary consumer?

The trace files contain annotated python source code.  There is no
formal specification that I am aware of as these files are intended
for human consumption.

..
 Relying on the default encoding is not really a good idea, though. This is 
 good for quick scripts or
 in the rare cases where it is by definition the expected behaviour. But in 
 more elaborate cases you
 certainly want to decide the encoding by yourself.

I agree and the correct encoding seems to be the encoding of the
original source file that trace annotates.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10329] trace.py and unicode in Python 3

2010-11-05 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

Attached patch, issue10329.diff fixes the issue by setting the encoding of the 
coverage file to that of the source file.  I am not 100% happy with this patch 
for the following reasons:

1. It opens the source file one more time.  This is probably acceptable because 
existing code already opens it at least four times when -m (show missing) 
option is selected.  (Twice in find_executable_linenos() and twice in 
linecache.getlines().  Fixing that would require refactoring of linecache code.

2. This will not work for source code not stored in a file, but provided by a 
__loader__.get_source() method.  However it looks like trace will not work at 
all in this case, so fixing that is a separate issue.

--
assignee:  - belopolsky
keywords: +needs review
Added file: http://bugs.python.org/file19517/issue10329.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com