On 07/04/2013 09:24 PM, Steven D'Aprano wrote:
On Thu, 04 Jul 2013 17:54:20 +0100, Rotwang wrote:
[...]
Anyway, none of the calculations that has been given takes into account
the fact that names can be /less/ than one million characters long.


Not in *my* code they don't!!!

*wink*


The
actual number of non-empty strings of length at most 1000000 characters,
that consist only of ascii letters, digits or underscores, and that
don't start with a digit, is

sum(53*63**i for i in range(1000000)) == 53*(63**1000000 - 1)//62


I take my hat of to you sir, or possibly madam. That is truly an inspired
piece of pedantry.


It's perhaps worth mentioning that some non-ascii characters are allowed
in identifiers in Python 3, though I don't know which ones.

PEP 3131 describes the rules:

http://www.python.org/dev/peps/pep-3131/

For example:

py> import unicodedata as ud
py> for c in '鿥µ¿μЖᚃ‰⇄∞':
...     print(c, ud.name(c), c.isidentifier(), ud.category(c))
...
é LATIN SMALL LETTER E WITH ACUTE True Ll
æ LATIN SMALL LETTER AE True Ll
¥ YEN SIGN False Sc
µ MICRO SIGN True Ll
¿ INVERTED QUESTION MARK False Po
μ GREEK SMALL LETTER MU True Ll
Ж CYRILLIC CAPITAL LETTER ZHE True Lu
ᚃ OGHAM LETTER FEARN True Lo
‰ PER MILLE SIGN False Po
⇄ RIGHTWARDS ARROW OVER LEFTWARDS ARROW False So
∞ INFINITY False Sm




The isidentifier() method will let you weed out the characters that cannot start an identifier. But there are other groups of characters that can appear after the starting "letter". So a more reasonable sample might be something like:

> py> import unicodedata as ud
> py> for c in '鿥µ¿μЖᚃ‰⇄∞':
> ...     xc = "X" + c
> ...     print(c, ud.name(c), xc.isidentifier(), ud.category(c))
> ...

In particular,
    http://docs.python.org/3.3/reference/lexical_analysis.html#identifiers

has a definition for id_continue that includes several interesting categories. I expected the non-ASCII digits, but there's other stuff there, like "nonspacing marks" that are surprising.

I'm pretty much speculating here, so please correct me if I'm way off.

--
DaveA

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to