[issue719888] tokenize module w/ coding cookie

2008-04-22 Thread Trent Nelson

Trent Nelson [EMAIL PROTECTED] added the comment:

This was fixed in trunk in r61573, and merged to py3k in r61982.

--
status: open - closed


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-18 Thread Michael Foord

Michael Foord [EMAIL PROTECTED] added the comment:

Made quite extensive changes to tokenize.py (with tests) for Py3k. This
migrates it to a 'bytes' API so that it can correctly decode Python
source files following PEP-0263.

--
nosy: +fuzzyman
Added file: http://bugs.python.org/file9735/tokenize.zip


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-18 Thread Mark Dickinson

Mark Dickinson [EMAIL PROTECTED] added the comment:

Michael, is the disappearance of the generate_tokens function in the new 
version of tokenize.py intentional?


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-18 Thread Michael Foord

Michael Foord [EMAIL PROTECTED] added the comment:

That was 'by discussion with wiser heads than I'. The existing module
has an old backwards compatibility interface called 'tokenize'. That can
be deprecated in 2.6.

As 'tokenize' is really the ideal name for the main entry point for the
module, 'generate_tokens' became tokenize for Py3.


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-18 Thread Mark Dickinson

Mark Dickinson [EMAIL PROTECTED] added the comment:

Is it worth keeping generate_tokens as an alias for tokenize, just
to avoid gratuitous 2-to-3 breakage?  Maybe not---I guess they're
different beasts, in that one wants a string-valued iterator and the 
other wants a bytes-valued iterator.

So if I understand correctly, the readline argument to tokenize
would have to return bytes instances.  Would it be worth adding a check
for this, to catch possible misuse?  You could put the check in 
detect_encoding, so that just checks that the first one or two yields
from readline have the correct type, and assumes that the rest is okay.


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-18 Thread Mark Dickinson

Mark Dickinson [EMAIL PROTECTED] added the comment:

Sorry---ignore the last comment;  if readline() doesn't supply bytes
then the line.decode('ascii') will fail with an AttributeError.  So
there won't be silent failure.

I'll try thinking first and posting later next time.


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-18 Thread Trent Nelson

Trent Nelson [EMAIL PROTECTED] added the comment:

Tested patch on Win x86/x64 2k8, XP  FreeBSD 6.2, +1.

--
assignee:  - Trent.Nelson
keywords: +patch


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-18 Thread Mark Dickinson

Mark Dickinson [EMAIL PROTECTED] added the comment:

With the patch, 

./python.exe Lib/test/regrtest.py test_tokenize

fails for me with the following output:

Macintosh-2:py3k dickinsm$ ./python.exe Lib/test/regrtest.py test_tokenize
test_tokenize
test test_tokenize produced unexpected output:
**
*** lines 2-5 of actual output doesn't appear in expected output after line 1:
+ testing: 
/Users/dickinsm/python_source/py3k/Lib/test/tokenize_tests-latin1-coding-cookie-and-utf8-bom-sig.txt
+ testing: 
/Users/dickinsm/python_source/py3k/Lib/test/tokenize_tests-no-coding-cookie-and-utf8-bom-sig-only.txt
+ testing: 
/Users/dickinsm/python_source/py3k/Lib/test/tokenize_tests-utf8-coding-cookie-and-utf8-bom-sig.txt
+ testing: 
/Users/dickinsm/python_source/py3k/Lib/test/tokenize_tests-utf8-coding-cookie-and-utf8-bom-sig.txt
**
1 test failed:
test_tokenize
[65880 refs]

I get something similar on Linux.


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-18 Thread Michael Foord

Michael Foord [EMAIL PROTECTED] added the comment:

If you remove the following line from the tests (which generates
spurious additional output on stdout) then the problem goes away:


print('testing: %s' % path, end='\n')


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-18 Thread Michael Foord

Michael Foord [EMAIL PROTECTED] added the comment:

*Full* patch (excluding the new dependent test text files) for Python 3.
Includes fixes for standard library and tools usage of tokenize.

If it breaks anything blame Trent... ;-)

--
versions:  -Python 2.6
Added file: http://bugs.python.org/file9741/tokenize_patch.diff


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-18 Thread Mark Dickinson

Mark Dickinson [EMAIL PROTECTED] added the comment:

All tests pass for me on OS X 10.5.2 and SuSE Linux 10.2 (32-bit)!


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-17 Thread Trent Nelson

Trent Nelson [EMAIL PROTECTED] added the comment:

I've attached a patch to test_tokenizer.py and a bunch of text files 
(that should be dropped into Lib/test) that highlight this issue a 
*lot* better than the current state of affairs.

The existing implementation defines roundup() in the doctest, then 
proceeds to define it again in the code body.  The last for loop in the 
doctest is failing every so often -- what it's failing on isn't at all 
clear as a) ten random files are selected out of 332 in Lib/test, and 
b) there's no way of figuring out which files are causing it to fail 
unless you hack another method into the test case to try and replicate 
what the doctest is doing, with some additional print statements (which 
is the approach I took, only to get bitten by the fact that roundup() 
was being resolved to the bogus definition that's in the code body, not 
the functional one in the doctest, which resulted in even more 
misleading behaviour).

FWIW, the file that causes the exception is test_doctest2.py as it 
contains encoded characters.

So, the approach this patch takes is to drop the 'pick ten random test 
files and untokenize/tokenize' approach and add a class that 
specifically tests for the tokenizer's compliance with PEP 0263.

I'll move on to a patch to tokenizer.py now, but this patch is ok to 
commit now -- it'll clean up the misleading errors being reported by 
the plethora of red 3.0 buildbots at the moment at the very least.

--
nosy: +Trent.Nelson


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-17 Thread Trent Nelson

Trent Nelson [EMAIL PROTECTED] added the comment:

Hmm, I take it multiple file uploads aren't supported.  I don't want to 
use svn diff for the text files as it looks like it's butchering the 
bom encodings, so, tar it is!  (Untar in root py3k/ directory.)

Added file: http://bugs.python.org/file9686/test_tokenize_patch.tar


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-16 Thread Mark Dickinson

Mark Dickinson [EMAIL PROTECTED] added the comment:

This issue is currently causing test_tokenize failures in Python 3.0.
There are other ways to fix the test failures, but making tokenize honor 
the source file encoding seems like the right thing to do to me.

Does this still seem like a good idea to everyone?

--
nosy: +marketdickinson
versions: +Python 2.6, Python 3.0 -Python 2.3


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue719888] tokenize module w/ coding cookie

2008-03-16 Thread Martin v. Löwis

Martin v. Löwis [EMAIL PROTECTED] added the comment:

In 3k, the tokenize module should definitely return strings, and, in 
doing so, it should definitely consider the encoding declaration (and 
also the default encoding in absence of the encoding declaration).

For 2.6, I wouldn't mind if it were changed incompatibly so that it 
returns Unicode strings, or else that it parses in Unicode, and then 
encodes back to the source encoding before returning anything.


Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue719888

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com