[issue12691] tokenize.untokenize is broken

2016-05-30 Thread Terry J. Reedy

Terry J. Reedy added the comment:

If there are, I can't remember.  This was one or 7 or 8 untokenize issues with 
about 5 separate bugs between them.  If there are any current untokenize issues 
not covered by some other open issue, then a new issue should be opened.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2016-05-30 Thread R. David Murray

R. David Murray added the comment:

Is there anything left to do here?

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-23 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 8d6dd02a973f by Terry Jan Reedy in branch '3.3':
Issue #20750, Enable roundtrip tests for new 5-tuple untokenize. The
http://hg.python.org/cpython/rev/8d6dd02a973f

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-23 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 0f0e9b7d4f1d by Terry Jan Reedy in branch '2.7':
Issue #9974: When untokenizing, use row info to insert backslash+newline.
http://hg.python.org/cpython/rev/0f0e9b7d4f1d

New changeset 24b4cd5695d9 by Terry Jan Reedy in branch '3.3':
Issue #9974: When untokenizing, use row info to insert backslash+newline.
http://hg.python.org/cpython/rev/24b4cd5695d9

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-18 Thread Terry J. Reedy

Terry J. Reedy added the comment:

I fixed the assert and dropped first iter compat-mode token bugs one-by-one by 
writing narrow unittests that fail and code that makes them pass. I am now 
working on the '\' continuation issue. That is the subject of #9974, which has 
a nearly identical patch. Where it differs from yours, I will choose on the 
basis of tests. Any further discussion of this bug should be on that issue.

I appreciate the warning that the full mode is undertested, so I need to be 
concerned about breaking untested functionality that works. That was not so 
much a concern with the first two issues.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-18 Thread Gareth Rees

Gareth Rees added the comment:

Thanks for your work on this, Terry. I apologise for the complexity of my 
original report, and will try not to do it again.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-17 Thread Terry J. Reedy

Changes by Terry J. Reedy tjre...@udel.edu:


--
assignee: docs@python - terry.reedy
nosy: +terry.reedy
stage: test needed - patch review
versions: +Python 2.7, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-17 Thread Roundup Robot

Roundup Robot added the comment:

New changeset c896d292080a by Terry Jan Reedy in branch '2.7':
Untokenize: An logically incorrect assert tested user input validity.
http://hg.python.org/cpython/rev/c896d292080a

New changeset 51e5a89afb3b by Terry Jan Reedy in branch '3.3':
Untokenize: An logically incorrect assert tested user input validity.
http://hg.python.org/cpython/rev/51e5a89afb3b

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-17 Thread Terry J. Reedy

Terry J. Reedy added the comment:

The problem of the first iterator pair token being discarded is the subject of 
#8478. Consider that part of this issue as being closed as a duplicate.

The issue of a string being returned if there is no encoding should have been 
opened as a separate issue, and it was in #16223. I explain there why I think 
the behavior should be left as is and the docstring changed, and the doc 
clarified.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-06 Thread Gareth Rees

Gareth Rees added the comment:

I did some research on the cause of this issue. The assertion was
added in this change by Jeremy Hylton in August 2006:
https://mail.python.org/pipermail/python-checkins/2006-August/055812.html
(The corresponding Mercurial commit is here:
http://hg.python.org/cpython/rev/cc992d75d5b3#l217.25).

At that point I believe the assertion was reasonable. I think it would
have been triggered by backslash-continued lines, but otherwise it
worked.

But in this change http://hg.python.org/cpython/rev/51e24512e305 in
March 2008 Trent Nelson applied this patch by Michael Foord
http://bugs.python.org/file9741/tokenize_patch.diff to implement PEP
263 and fix issue719888. The patch added ENCODING tokens to the output
of tokenize.tokenize(). The ENCODING token is always generated with
row number 0, while the first actual token is generated with row
number 1. So now every token stream from tokenize.tokenize() sets off
the assertion.

The lack of a test case for tokenize.untokenize() in full mode meant
that it was (and is) all too easy for someone to accidentally break it
like this.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-05 Thread Gareth Rees

Changes by Gareth Rees g...@garethrees.org:


--
assignee:  - docs@python
components: +Documentation, Tests
nosy: +docs@python

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-05 Thread Gareth Rees

Changes by Gareth Rees g...@garethrees.org:


Removed file: http://bugs.python.org/file33919/Issue12691.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-05 Thread Yury Selivanov

Yury Selivanov added the comment:

Gareth,

Thanks a lot for such a comprehensive writeup and the patch. Please give me a 
day or two to do the review.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-04 Thread Gareth Rees

Gareth Rees added the comment:

Yury, let me see if I can move this issue forward. I clearly haven't
done a good job of explaining these problems, how they are related,
and why it makes sense to solve them together, so let me have a go
now.

1. tokenize.untokenize() raises AssertionError if you pass it a
   sequence of tokens output from tokenize.tokenize(). This was my
   original problem report, and it's still not fixed in Python 3.4:

  Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) 
  [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
  Type help, copyright, credits or license for more information.
   import tokenize, io
   t = list(tokenize.tokenize(io.BytesIO('1+1'.encode('utf8')).readline))
   tokenize.untokenize(t)
  Traceback (most recent call last):
File stdin, line 1, in module
File 
/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py,
 line 317, in untokenize
  out = ut.untokenize(iterable)
File 
/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py,
 line 246, in untokenize
  self.add_whitespace(start)
File 
/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py,
 line 232, in add_whitespace
  assert row = self.prev_row
  AssertionError

   This defeats any attempt to use the sequence:

  input code - tokenize - transform - untokenize - output code

   to transform Python code. But this ought to be the main use case
   for the untokenize function! That's how I came across the problem
   in the first place, when I was starting to write Minipy
   https://github.com/gareth-rees/minipy.

2. Fixing problem #1 is easy (just swap = for =), but it raises the
   question: why wasn't this mistake caught by test_tokenize? There's
   a test function roundtrip() whose docstring says:

  Test roundtrip for `untokenize`. `f` is an open file or a
  string. The source code in f is tokenized, converted back to
  source code via tokenize.untokenize(), and tokenized again from
  the latter. The test fails if the second tokenization doesn't
  match the first.

   If I don't fix the problem with roundtrip(), then how can I be
   sure I have fixed the problem? Clearly it's necessary to fix the
   test case and establish that it provokes the assertion.

   So why doesn't roundtrip() detect the error? Well, it turns out
   that tokenize.untokenize() has two modes of operation and
   roundtrip() only tests one of them.

   The documentation for tokenize.untokenize() is rather cryptic, and
   all it says is:

  Each element returned by the [input] iterable must be a token
  sequence with at least two elements, a token number and token
  value. If only two tokens are passed, the resulting output is
  poor.

   By reverse-engineering the implementation, it seems that it has two
   modes of operation.

   In the first mode (which I have called compatibility mode after
   the method Untokenizer.compat() that implements it) you pass it
   tokens in the form of 2-element tuples (type, text). These must
   have exactly 2 elements.

   In the second mode (which I have called full mode based on the
   description full input in the docstring) you pass it tokens in
   the form of tuples with 5 elements (type, text, start, end, line).
   These are compatible with the namedtuples returned from
   tokenize.tokenize().

   The full mode has the buggy assertion, but
   test_tokenize.roundtrip() only tests the compatibility mode.

   So I must (i) fix roundtrip() so that it tests both modes; (ii)
   improve the documentation for tokenize.untokenize() so that
   programmers have some chance of figuring this out in future!

3. As soon as I make roundtrip() test both modes it provokes the
   assertion failure. Good, so I can fix the assertion. Problem #1
   solved.

   But now there are test failures in full mode:

  $ ./python.exe -m test test_tokenize
  [1/1] test_tokenize
  **
  File /Users/gdr/hg.python.org/cpython/Lib/test/test_tokenize.py, line 
?, in test.test_tokenize.__test__.doctests
  Failed example:
  for testfile in testfiles:
  if not roundtrip(open(testfile, 'rb')):
  print(Roundtrip failed for file %s % testfile)
  break
  else: True
  Expected:
  True
  Got:
  Roundtrip failed for file 
/Users/gdr/hg.python.org/cpython/Lib/test/test_platform.py
  **
  1 items had failures:
 1 of  73 in test.test_tokenize.__test__.doctests
  ***Test Failed*** 1 failures.
  test test_tokenize failed -- 1 of 78 doctests failed
  1 test failed:
  test_tokenize

   Examination of the failed tokenization shows that if the source
   

[issue12691] tokenize.untokenize is broken

2014-02-04 Thread Gareth Rees

Changes by Gareth Rees g...@garethrees.org:


--
nosy: +benjamin.peterson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-01-31 Thread Yury Selivanov

Yury Selivanov added the comment:

bump?

--
nosy: +yselivanov
versions: +Python 3.4 -Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2013-02-02 Thread Meador Inge

Meador Inge added the comment:

I will take a look.  As it stands the current patch fixes way too many
issues.  Patches should fix *one* issue at a time.  I will look at fixing
the original assert problem and at opening new issues for the others
(assuming there aren't already issues for them).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2013-01-28 Thread Thomas Kluyver

Thomas Kluyver added the comment:

Is there anything I can do to push this forwards? I'm trying to use tokenize 
and untokenize in IPython, and for now I'm going to have to maintain our own 
copies of it (for Python 2 and 3), because I keep running into problems with 
the standard library module.

--
nosy: +takluyver

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2012-12-28 Thread Meador Inge

Changes by Meador Inge mead...@gmail.com:


--
nosy: +meador.inge

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2012-10-15 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2011-08-05 Thread Daniel Urban

Changes by Daniel Urban urban.dani...@gmail.com:


--
nosy: +durban

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2011-08-05 Thread Gareth Rees

Gareth Rees g...@garethrees.org added the comment:

I think I can make these changes independently and issue two patches, one 
fixing the problems with untokenize listed here, and another improving tokenize.

I've just noticed a third bug in untokenize: in full mode, it doesn't handle 
backslash-continued lines correctly.

Python 3.3.0a0 (default:c099ba0a278e, Aug  2 2011, 12:35:03) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on 
darwin
Type help, copyright, credits or license for more information.
 from io import BytesIO
 from tokenize import tokenize, untokenize
 untokenize(tokenize(BytesIO('1 and \\\n not 
2'.encode('utf8')).readline))
b'1 andnot 2'

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2011-08-05 Thread Gareth Rees

Gareth Rees g...@garethrees.org added the comment:

Please find attached a patch containing four bug fixes for untokenize():

* untokenize() now always returns a bytes object, defaulting to UTF-8 if no 
ENCODING token is found (previously it returned a string in this case).
* In compatibility mode, untokenize() successfully processes all tokens from an 
iterator (previously it discarded the first token).
* In full mode, untokenize() now returns successfully (previously it asserted).
* In full mode, untokenize() successfully processes tokens that were separated 
by a backslashed newline in the original source (previously it ran these tokens 
together).

In addition, I've added some unit tests:

* Test case for backslashed newline.
* Test case for missing ENCODING token.
* roundtrip() tests both modes of untokenize() (previously it just tested 
compatibility mode).

and updated the documentation:

* Update the docstring for untokenize to better describe its actual behaviour, 
and remove the false claim Untokenized source will match input source 
exactly. (We can restore this claim if we ever fix tokenize/untokenize so that 
it's true.)
* Update the documentation for untokenize in tokenize.rdt to match the 
docstring.

I welcome review: this is my first proper patch to Python.

--
keywords: +patch
Added file: http://bugs.python.org/file22842/Issue12691.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2011-08-05 Thread Gareth Rees

Gareth Rees g...@garethrees.org added the comment:

Thanks Ezio for the review. I've made all the changes you requested, (except 
for the re-ordering of paragraphs in the documentation, which I don't want to 
do because that would lead to the round-trip property being mentioned before 
it's defined). Revised patch attached.

--
Added file: http://bugs.python.org/file22844/Issue12691.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2011-08-04 Thread Gareth Rees

New submission from Gareth Rees g...@garethrees.org:

tokenize.untokenize is completely broken.

Python 3.2.1 (default, Jul 19 2011, 00:09:43) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type help, copyright, credits or license for more information.
 import tokenize, io
 t = list(tokenize.tokenize(io.BytesIO('1+1'.encode('utf8')).readline))
 tokenize.untokenize(t)
Traceback (most recent call last):
  File stdin, line 1, in module
  File 
/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py,
 line 250, in untokenize
out = ut.untokenize(iterable)
  File 
/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py,
 line 179, in untokenize
self.add_whitespace(start)
  File 
/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py,
 line 165, in add_whitespace
assert row = self.prev_row
AssertionError

The assertion is simply bogus: the = should be =.

The reason why no-one has spotted this is that the unit tests for the tokenize 
module only ever call untokenize() in compatibility mode, passing in a 
2-tuple instead of a 5-tuple.

I propose to fix this, and add unit tests, at the same time as fixing other 
problems with tokenize.py (issue12675).

--
components: Library (Lib)
messages: 141634
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: tokenize.untokenize is broken
type: behavior
versions: Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2011-08-04 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti
stage:  - test needed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2011-08-04 Thread Sandro Tosi

Sandro Tosi sandro.t...@gmail.com added the comment:

Hi Gareth, would you like to provide a patch to fix the bug you spotted and add 
the relative case into the testsuite?

--
nosy: +sandro.tosi

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2011-08-04 Thread Gareth Rees

Gareth Rees g...@garethrees.org added the comment:

See my last paragraph: I propose to deliver a single patch that fixes both this 
bug and issue12675. I hope this is OK. (If you prefer, I'll try to split the 
patch in two.)

I just noticed another bug in untokenize(): in compatibility mode, if 
untokenize() is passed an iterator rather than a list, then the first token 
gets discarded:

Python 3.3.0a0 (default:c099ba0a278e, Aug  2 2011, 12:35:03) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on 
darwin
Type help, copyright, credits or license for more information.
 from tokenize import untokenize
 from token import *
 untokenize([(NAME, 'hello')])
'hello '
 untokenize(iter([(NAME, 'hello')]))
''

No-one's noticed this because the unit tests only ever pass lists to 
untokenize().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2011-08-04 Thread Sandro Tosi

Sandro Tosi sandro.t...@gmail.com added the comment:

The general rule would be to have separate patches. But in this case, if we 
have interdipendent changes, then those should be packed in a single patch 
(f.e. if changes to tokenize break untokenize, than those parts should be 
joined).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2011-08-04 Thread Eric Snow

Changes by Eric Snow ericsnowcurren...@gmail.com:


--
nosy: +ericsnow

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com