Terry J. Reedy added the comment:

I checked for usage: _id(_first)_chars is only used in _eat_identifier, which 
is used in one place in get_expression. That is called once each in 
AutoComplete and CallTips. Both are seldom visible accept as requested (by 
waiting, for calltips). Calltips is only called on entry of '('. Is 
AutoComplete doing hidden checks?

_eat_identifier currently does a linear 'c in string' scan of the 2 char 
strings. I believe that both are long enough that O(1) 'c in set' checks would 
be faster. The sets could be augmented with latin1 id chars without becoming 
hugh or slowing the check (see below). This is a change we could make as soon 
as the test file and new failing tests are ready.

I just discovered, new in 3.x, str.isidentifier.
>>> '1'.isidentifier()
False
>>> 'a'.isidentifier()
True
>>> '\ucccc'
'쳌'
>>> '\ucccc'.isidentifier()
True

This is, however, meant to be used in the forward direction. If 
s[pos].isidentifier(), check s[pos:end].identifier(), where end is 
progressively incremented until the check fails. For backwards checking, it 
could be used with a start char prefixed: ('a'+s[start:pos]).isidentifier(). To 
limit the cost, the start decrement could be 4 chars at a time, with 2 extra 
tests (binary search) on failure to find the actual start.

The 3.x versions of other isxyg functions could be useful: isalpha, isdecimal, 
isdigit, isnumeric. We just have to check their definitions against the two 
identifier class definitions.

What is slightly annoying is that in CPython 3.3+, all-ascii strings are marked 
as such but the info is not directly accessible without without ctypes. I 
believe all-latin-1 strings can be detected by comparing sys.getsizeof(s) to 
len(s), so we could augment the char sets to include the extra identifier chars 
in latin-1.

We could add a configuation option to assume all-ascii (or better, all-latin1 
code chars or not, and note that 'all latin1' will run faster but not recognize 
identifiers for the two features that use this.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21765>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to