[issue18873] Encoding detected in non-comment lines

2014-10-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I haven't fixed all bugs in handling encoding cookie yet (there are separate 
issues). Well, this issue can be closed, I'll open new issue about the PEP when 
will be needed. The PEP should be corrected because it affects how other Python 
implementations and other tools handle this.

--
resolution:  - fixed
stage: patch review - resolved
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2014-10-02 Thread Terry J. Reedy

Terry J. Reedy added the comment:

This looks like it could be closed.  We normally do not patch PEPs after they 
are implemented.  Does a corrected version of something in PEP263 need to be 
added to the ref manual?

--
components:  -IDLE
versions: +Python 3.5 -Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-17 Thread Roundup Robot

Roundup Robot added the comment:

New changeset f16855d6d4e1 by Serhiy Storchaka in branch '2.7':
Remove the use of non-existing re.ASCII.
http://hg.python.org/cpython/rev/f16855d6d4e1

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thanks, David.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-16 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 2dfe8262093c by Serhiy Storchaka in branch '3.3':
Issue #18873: The tokenize module, IDLE, 2to3, and the findnocoding.py script
http://hg.python.org/cpython/rev/2dfe8262093c

New changeset 6b747ad4a99a by Serhiy Storchaka in branch 'default':
Issue #18873: The tokenize module, IDLE, 2to3, and the findnocoding.py script
http://hg.python.org/cpython/rev/6b747ad4a99a

New changeset 3d46ef0c62c5 by Serhiy Storchaka in branch '2.7':
Issue #18873: IDLE, 2to3, and the findnocoding.py script now detect Python
http://hg.python.org/cpython/rev/3d46ef0c62c5

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 If there is not now, it would be nice if there were just one python-coded 
 function in Lib/tokenize.py that could be imported and used by the other 
 python code.

Agree. But look how many tokenize issues are opened around.

Thank you for your report Paul.

I left PEP 263 not fixed yet. Perhaps it needs rewording (especially in the 
light of other issues, such as #18960 and #18961).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-16 Thread Terry J. Reedy

Terry J. Reedy added the comment:

One of the problem with encoding recognition is that the same logic is 
more-or-less reproduced multiple places, so any fix needs to be applied 
multiple places. From the detect_encoding_in_comments_only.patch:
Lib/idlelib/IOBinding.py
Lib/lib2to3/pgen2/tokenize.py
Lib/tokenize.py
Tools/scripts/findnocoding.py
Any fix for issues *18960 and *18961 may also need multiple applications.

If there is not now, it would be nice if there were just one python-coded 
function in Lib/tokenize.py that could be imported and used by the other python 
code. (I was going to suggest exposing the function in tokenize.c, but I 
believe the point of tokenize.py is to not be dependent on CPython.)

I believe the Idle support for \r became obsolete when support for MacOS9 was 
dropped in 2.4. I notice that it is not part of io universal newline support.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-16 Thread R. David Murray

R. David Murray added the comment:

This appears to be resulting in buildbot lib2to3 test failures.  ex:

http://buildbot.python.org/all/builders/x86%20Ubuntu%20Shared%202.7/builds/2319/steps/test/logs/stdio

http://buildbot.python.org/all/builders/PPC64%20PowerLinux%202.7/builds/206/steps/test/logs/stdio

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The tokenize module, 2to3, IDLE, and the Tools/scripts/findnocoding.py script 
affected by this bug. Proposed patch fixes this in all places and adds tests 
for tokenize and 2to3.

--
components: +Demos and Tools, IDLE, Library (Lib)
nosy: +georg.brandl, kbk, loewis, meador.inge, roger.serwy, terry.reedy
stage: needs patch - patch review
Added file: 
http://bugs.python.org/file31645/detect_encoding_in_comments_only.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

And here is a patch which fixes the regular expression in PEP 263.

--
Added file: http://bugs.python.org/file31646/pep0263_regex.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-07 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
assignee:  - serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-07 Thread Terry J. Reedy

Terry J. Reedy added the comment:

Nasty bug. Running a file with 'coding=0', a quite legitimate assignment 
statement, causes Idle to close, with LookupError, leading to SyntaxError, 
reported on the console if there is one ('crash' otherwise). (Idle closing is a 
separate problem, with an issue, from the misinterpretation of 'coding'.)

Loading such a file works with a warning that should not be there.

Adding # leads to SyntaxError: unknown encoding in a message box, without 
closing Idle. I presume this is to be expected and is proper. There is also a 
warning on loading.

The code patch adds '^[ \t\f]' to the re. \f = FormFeed? Should that really be 
there? The PEP patch instead adds '^[ \t\v]', \v= VerticalTab? Same question, 
and why the difference?

Your other changes to IOBinding.coding_spec look correct and fix a couple of 
bugs in the function (searching all lines for the coding cookie, mangling a 
line without a line end).

Someone else should review the other code changes.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 The code patch adds '^[ \t\f]' to the re. \f = FormFeed? Should that really 
 be there? The PEP patch instead adds '^[ \t\v]', \v= VerticalTab? Same 
 question, and why the difference?

Good catch. I missed in the PEP patch, it should be '\f' ('\014') in all cases.

Yes, it should be. It corresponds to the code in Parser/tokenizer.c.

--
Added file: http://bugs.python.org/file31655/pep0263_regex.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-07 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Removed file: http://bugs.python.org/file31646/pep0263_regex.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-06 Thread Sergey Vishnikin

Sergey Vishnikin added the comment:

-cookie_re = re.compile(coding[:=]\s*([-\w.]+))
+cookie_re = re.compile(#[^\r\n]*coding[:=]\s*([-\w.]+))

Regex matches only if the encoding expression is preceded by a comment.

--
keywords: +patch
nosy: +armicron
Added file: http://bugs.python.org/file31628/tokenizer.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-09-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

It will fail on:

#coding=0

I'm wondering why findall() is used to match this regexp.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-08-29 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
keywords: +easy
nosy: +benjamin.peterson
stage:  - needs patch
type: crash - behavior
versions: +Python 3.4 -Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18873] Encoding detected in non-comment lines

2013-08-28 Thread Paul Bonser

New submission from Paul Bonser:

lib2to3.pgen2.tokenize:detect_encoding looks for the regex 
coding[:=]\s*([-\w.]+) in the first two lines of the file without first 
checking if they are comment lines.

You can get 2to3 to fail with SyntaxError: unknown encoding: 0 with a single 
line file:

coding=0

A simple fix would be to check that the line is a comment before trying to look 
up the encoding from that line.

--
components: 2to3 (2.x to 3.x conversion tool)
messages: 196435
nosy: Paul.Bonser
priority: normal
severity: normal
status: open
title: Encoding detected in non-comment lines
type: crash
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18873
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com