[issue45692] IDLE: define word/id chars in one place.

2021-11-03 Thread Alex Waygood


Alex Waygood  added the comment:

The PR that proposes creating a new utility.py file is mine, linked to 
https://bugs.python.org/issue45447. Would it make things easier if I split it 
into two PRs: one adding an empty util.py file, and the other making my 
proposed changes to support syntax highlighting for .pyi files?

--
nosy: +AlexWaygood

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-03 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
keywords: +patch
pull_requests: +27639
stage: test needed -> patch review
pull_request: https://github.com/python/cpython/pull/29381

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-02 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

hyperparser.py, line 13, _ASCII_ID_CHARS
line 15, _ASCII_ID_FIRST_CHARS, frozenset(string.ascii_letters + "_")

Only used on line 18 and 21 to create 128 item lookup tables.  The point is to 
be fast as hyperparser scans multiple chars when invoked.  The expandword fix 
and c.isidentifier() could replace the lookup.  But would they bog down 
response time?  We need to look at hyperparser use cases and do some testing.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-02 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

editor.py, line 809, IDENTCHARS

Used in the immediately following def colorize_syntax_error on line 814.
if char and char in self.IDENTCHARS:
text.tag_add("ERROR", pos + " wordstart", pos)
I believe the intent is to color the part of an identifier that precedes the 
character marked.  At the moment, I cannot think of how to trigger such a 
situation.  I would have to add some console prints to investigate.  Maybe this 
should get the autoexpand fix, but maybe it is dead code.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-02 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

undo.py, line 254, alphanumeric

Used in immediately following lines to classify chars as 'alphanumeric', 
'newline', or 'punctuation' (the default).  I believe I have only ever looked 
at this module to add the test code at the bottom.  In any case, I don't know 
the effect of calling non-ascii chars punctuation, but suspect it is not the 
best thing.  So I suspect that the autoexpand fix would be the best.

The classify method is only used on line 248 in the merge method above.
  self.classify(self.chars[-1]) != self.classify(cmd.chars)
merge() is only used on l.124 in addcmd.  

To figure out more, I would experiment identifiers without and with non-ascii 
and undo and redo and see what difference there is.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-02 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

autocomplete.py, line 33, ID_CHARS is only used on line 137 to find the prefix 
of an identifier when completions have been explicitly requested.

while i and (curline[i-1] in ID_CHARS or ord(curline[i-1]) > 127):
i -= 1
comp_start = curline[i:j]

The completion is for a name or attribute depending on whether the preceding 
char, if any, is '.'.  Here, the unicode fix was to accept all non-ascii as 
possible id chars.  There is no harm as the completion box only has valid 
completions, and if the prefix given does not match anything, nothing is 
highlighted.

ID_CHARS could be moved to utils if the same ascii string is used in another 
module.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-02 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

autoexpand.py, line 20, wordchars

'wordchars' is correct here since words beginning with digits can be expanded.

>>> s = '0x4f334'
>>> 0x4f334  # Hit alt-/ after 0 and enter
324404

Used in line 89 in method getprevword

while i > 0 and line[i-1] in self.wordchars:
i = i-1

Proposed replacement seems to work.

>>> i = len(s)
... while i > 0 and (c := s[i-1] or c == '_'):
... i -= 1
... 
>>> i,c
(0, '0')

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-02 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

There have been occasional discussions about IDLE not being properly unicode 
aware in some of its functions.  Discussions have foundered on these facts and 
no fix made.  

1. The direct replacement string, your 'identcontchars', seems too big. We have 
always assumed that O(n) linear scans would be too slow.
2. A frozen set should give O(1) lookup, like fast enough, but would be even 
bigger.
3. The string methods operate on and scan through multiple chars, whereas IDLE 
wants to test 1 char at a time.
4. Even if the O(n*n) behavior of multiple calls is acceptible, there is no 
function for unicode continuation chars.  s.idchars requires that the first 
character be a start char, which is to say, not a digit.  s.alnum is false for 
'_'.  (Otherwise, it would work.)

I would like to better this time.  Possible responses to the blockers:

1. Correct; reject.

2. Maybe adding an elephant is better than keeping multiple IDLE features 
disabled for non-ascii users.  How big?

>>> import sys
>>> fz = frozenset(c for c in map(chr, range(0x11)) if ('a'+c).isidentifier)
>>> sys.getsizeof(fz)
33554648

Whoops, each 2 or 4 byte slice of the underlying array becomes 76 bytes + 8 
bytes * size of hash array.  Not practical either.

3. For at least some of the uses, the repeated calls may be fast enough.

4. We can synthesize s.isidcontinue with "c.isalnum() or c == '_'".   
"c.isidentifier() or c.isdigit()" would also work but should be slower.

Any other ideas?  I will look at the use cases next.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-02 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

I checked for other possible ascii only problems and only found

config_key.py: 14: ALPHANUM_KEYS = tuple(string.ascii_lowercase + string.digits)
config_key.py: 39: if 'Shift' in modifiers and key in 
string.ascii_lowercase:
config_key.py: 15: PUNCTUATION_KEYS = tuple('~!@#%^&*()_-+={}[]|;:,.<>/?')
config_key.py: 20: AVAILABLE_KEYS = (ALPHANUM_KEYS + PUNCTUATION_KEYS + 
FUNCTION_KEYS +

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-02 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

This in an interesting problem. I am going to work on it at the next weekends.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-02 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Complete sets of characters which can be used in identifiers are too large:

>>> allchars = ''.join(map(chr, range(0x11)))
>>> identstartchars = ''.join(c for c in allchars if c.isidentifier())
>>> identcontchars = ''.join(c for c in allchars if ('a' + c).isidentifier())
>>> len(identstartchars), len(identcontchars)
(131975, 135053)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-02 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

This set is mostly outdated. In Python 2 it was a set of characters composing 
identifiers, but in Python 3 identifiers can contain non-ASCII characters.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45692] IDLE: define word/id chars in one place.

2021-11-02 Thread Terry J. Reedy


New submission from Terry J. Reedy :

IDLE currently defines the same set of chars in 5 places with 5 names. (Listed 
by Serhiy Storchaka in #45669.)
Lib/idlelib/autoexpand.py:20:wordchars = string.ascii_letters + 
string.digits + "_"
Lib/idlelib/undo.py:254:alphanumeric = string.ascii_letters + string.digits 
+ "_"
Lib/idlelib/editor.py:809:IDENTCHARS = string.ascii_letters + string.digits 
+ "_"
Lib/idlelib/hyperparser.py:13:_ASCII_ID_CHARS = frozenset(string.ascii_letters 
+ string.digits + "_")
Lib/idlelib/autocomplete.py:33:ID_CHARS = string.ascii_letters + string.digits 
+ "_"

I suspect that either a string or frozenset would work everywhere (check).  I 
will pick a name after checking this.  The single definition would go in the 
proposed utils.py, which is part of another issue and PR.

(Note: the utility tk index functions should also go there.)

--
assignee: terry.reedy
components: IDLE
messages: 405516
nosy: terry.reedy
priority: normal
severity: normal
stage: test needed
status: open
title: IDLE: define word/id chars in one place.
type: enhancement
versions: Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com