subject:"unicode problem\?"


On 04/07/2016 15:46, Ned Batchelder wrote:

On Monday, July 4, 2016 at 10:36:54 AM UTC-4, BartC wrote:

On 04/07/2016 13:47, Ned Batchelder wrote:



This is a huge change.


I've used a kind of 'weak' import scheme elsewhere, corresponding to C's
'#include'.



I think that could work in Python provided whatever is defined can
tolerate having copies redefined in each module that includes the same
file. Anything that is defined once and is never assigned to nor
modified for example.


You are hand-waving over huge details of semantics that are very important
in Python.  For example, it is very important not to have copies of
classes.  Importing a module must produce the same module object
everywhere it is imported, and the classes defined in the module must
be defined only once.


So that would be something that doesn't tolerate copies.

But I think that a bigger change for Python wouldn't be new ways of 
doing imports, but the concept of having a user-defined anything that is 
a constant at compile-time. And not part of a conditional statement either.


Usually anything that is defined can be changed at run-time so that the 
compiler can never assume anything.


--
Bartc

--
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Ned Batchelder

On Monday, July 4, 2016 at 10:36:54 AM UTC-4, BartC wrote:
> On 04/07/2016 13:47, Ned Batchelder wrote:
> > On Monday, July 4, 2016 at 6:05:20 AM UTC-4, BartC wrote:
> >> On 04/07/2016 03:30, Steven D'Aprano wrote:
> 
> >>> You're still having problems with the whole Python-as-a-dynamic-language
> >>> thing, aren't you? :-)
> 
> >> Most Pythons seem to pre-compile code before executing the result. That
> >> pre-compilation requires that operators and precedences are known in
> >> advance and the resulting instructions are then hard-coded before 
> >> execution.
> >
> > This is the key but subtle point that all the discussion of parser mechanics
> > are missing: Python today needs no information from imported modules in
> > order to compile a file.  When the compiler encounters "import xyzzy" in
> > a file, it doesn't have to do anything to find or read xyzzy.py at compile
> > time.
> 
> Yeah, there's that small detail. Anything affecting how source is to be 
> parsed needs to known in advance.
> 
> > If operators can be invented, they will only be useful if they can be
> > created in modules which you then import and use.  But that would mean that
> > imported files would have to be found and read during compilation, not
> > during execution as they are now.
> >
> > This is a huge change.
> 
> I've used a kind of 'weak' import scheme elsewhere, corresponding to C's 
> '#include'.
> 
> Then the textual contents of that 'imported' module are read by the 
> compiler, and treated as though they occurred in this module. No new 
> namespace is created.
> 
> I think that could work in Python provided whatever is defined can 
> tolerate having copies redefined in each module that includes the same 
> file. Anything that is defined once and is never assigned to nor 
> modified for example.

You are hand-waving over huge details of semantics that are very important
in Python.  For example, it is very important not to have copies of
classes.  Importing a module must produce the same module object
everywhere it is imported, and the classes defined in the module must
be defined only once.

This is what makes catching exceptions work (because it is based on an
exception being an instance of a particular class), and what makes
class attributes shared among all the instances of the class.

--Ned.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of


On 04/07/2016 13:47, Ned Batchelder wrote:

On Monday, July 4, 2016 at 6:05:20 AM UTC-4, BartC wrote:

On 04/07/2016 03:30, Steven D'Aprano wrote:



You're still having problems with the whole Python-as-a-dynamic-language
thing, aren't you? :-)



Most Pythons seem to pre-compile code before executing the result. That
pre-compilation requires that operators and precedences are known in
advance and the resulting instructions are then hard-coded before execution.


This is the key but subtle point that all the discussion of parser mechanics
are missing: Python today needs no information from imported modules in
order to compile a file.  When the compiler encounters "import xyzzy" in
a file, it doesn't have to do anything to find or read xyzzy.py at compile
time.


Yeah, there's that small detail. Anything affecting how source is to be 
parsed needs to known in advance.



If operators can be invented, they will only be useful if they can be
created in modules which you then import and use.  But that would mean that
imported files would have to be found and read during compilation, not
during execution as they are now.

This is a huge change.


I've used a kind of 'weak' import scheme elsewhere, corresponding to C's 
'#include'.


Then the textual contents of that 'imported' module are read by the 
compiler, and treated as though they occurred in this module. No new 
namespace is created.


I think that could work in Python provided whatever is defined can 
tolerate having copies redefined in each module that includes the same 
file. Anything that is defined once and is never assigned to nor 
modified for example.


--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Ned Batchelder

On Monday, July 4, 2016 at 6:05:20 AM UTC-4, BartC wrote:
> On 04/07/2016 03:30, Steven D'Aprano wrote:
> > On Mon, 4 Jul 2016 10:17 am, BartC wrote:
> >
> >> On 04/07/2016 01:00, Lawrence D’Oliveiro wrote:
> >>> On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote:
>  Python lacks a mechanism to add user-defined operators. (R has this
>  capability.) Maybe this feature could be added.
> >>>
> >>> That would be neat. But remember, you would have to define the operator
> >>> precedence as well. So you could no longer use a recursive-descent
> >>> parser.
> >>
> >> That wouldn't be a problem provided the new operator symbol and its
> >> precedence is known at a compile time, and defined before use.
> >
> > You're still having problems with the whole Python-as-a-dynamic-language
> > thing, aren't you? :-)
> 
> Well it isn't completely dynamic, not unless code only exists as a eval 
> or exec argument string (and even there, any changes will only be seen 
> on calling eval or exec again on the same string).
> 
> Most Pythons seem to pre-compile code before executing the result. That 
> pre-compilation requires that operators and precedences are known in 
> advance and the resulting instructions are then hard-coded before execution.

This is the key but subtle point that all the discussion of parser mechanics
are missing: Python today needs no information from imported modules in
order to compile a file.  When the compiler encounters "import xyzzy" in
a file, it doesn't have to do anything to find or read xyzzy.py at compile
time.

If operators can be invented, they will only be useful if they can be
created in modules which you then import and use.  But that would mean that
imported files would have to be found and read during compilation, not
during execution as they are now.

This is a huge change.

--Ned.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Rustom Mody

On Monday, July 4, 2016 at 3:56:43 PM UTC+5:30, BartC wrote:
> On 04/07/2016 02:15, Lawrence D’Oliveiro wrote:
> > On Monday, July 4, 2016 at 12:40:14 PM UTC+12, BartC wrote:
> >> The structure of such a parser doesn't need to exactly match the grammar
> >> with a dedicated block of code for each operator precedence. It can be
> >> table-driven so that an operator precedence value is just an attribute.
> >
> > Of course. But that’s not a recursive-descent parser any more.
> >
> 
> All the parsers I write work the same way. If I can't describe them as 
> recursive descent, then I don't know what they are.
> 
> This is just recognising that a bunch of specialised functions that are 
> very similar can be reduced to one or two more generalised ones.

In gofer (likewise Haskell) one can concoct any operator and give it a 
precedence
and associativity -- l,r,non

Internals of Haskell I do not know, but of gofer I can say the following:

Implementation is in C.
Uses yacc to parse all operators left-assoc, same precedence
Then post-processes the tree with an elegant little shift-reduce parser
based on specified precedences and associativities.

I sometimes teach this to my kids as an example of how 
FP-style comments can clarify arcane imperative code:

Mark Jones (gofer author) original version + My version made executable
http://blog.languager.org/2016/07/a-little-functional-parser.html
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Jussi Piitulainen

BartC writes:

> A simpler approach is to treat user-defined operators as aliases for
> functions:
>
> def myadd(a,b):
>   return a+b
>
> operator ∇:
>(myadd,2,+3)   # map to myadd, 2 operands, prio 3, LTR
>
> x = y ∇ z
>
> is then equivalent to:
>
> x = myadd(y,z)
>
> However you will usually want to be able overload the same operator
> for different operand types. That means mapping the operator to one of
> several methods. Maybe even allowing the operator to have either one
> or two operands.
>
> Trickier but still doable I think.

Julia does something like that. The parser knows a number of symbols
that it treats as operators, some of them are aliases for ASCII names,
all operators correspond to generic functions, and the programmer can
add methods for their own types (or for pre-existing types) to these
functions.

Prolog opens its precedence table for the programmer. I don't know if
there's been any Unicode activity, or any activity, in recent years, but
there are actually two different issues here: what is parsed as an
identifier, and what identifiers are treated as operator symbols (with
what precedence and associativity).
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of


On 04/07/2016 02:15, Lawrence D’Oliveiro wrote:

On Monday, July 4, 2016 at 12:40:14 PM UTC+12, BartC wrote:

The structure of such a parser doesn't need to exactly match the grammar
with a dedicated block of code for each operator precedence. It can be
table-driven so that an operator precedence value is just an attribute.


Of course. But that’s not a recursive-descent parser any more.



All the parsers I write work the same way. If I can't describe them as 
recursive descent, then I don't know what they are.


This is just recognising that a bunch of specialised functions that are 
very similar can be reduced to one or two more generalised ones.


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Jussi Piitulainen

Lawrence D’Oliveiro writes:

> On Monday, July 4, 2016 at 6:08:51 PM UTC+12, Jussi Piitulainen wrote:
>> Something could be done, but if the intention is to allow
>> mathematical notation, it needs to be done with care.
>
> Mathematics uses single-character variable names so that
> multiplication can be implicit.

Certainly on topic, though independent of Unicode. I was thinking of
different classes of operator symbols.

> An old, stillborn language design from the 1960s called CPL* had two
> syntaxes for variable names:
> * a single lowercase letter, optionally followed by any number of primes “'”;
> * an uppercase letter followed by letters or digits.
>
> It also allowed implicit multiplication; single-letter identifiers
> could be run together without spaces, but multi-character ones needed
> to be delimited by spaces or non-identifier characters. E.g.
>
>   Sqrt(bb - 4ac)
>   Area ≡ Length Width
>
> *It was never fully implemented, but a cut-down derivative named BCPL
> did get some use. Some researchers at Bell Labs took it as their
> starting point, first creating a language called “B”, then another one
> called “C” ... well, the rest is history. 

There's been at least D, F, J, K (APL family), R, S (_before_ R), T (a
Lisp), X (the window system), Z (some specification language).

Any single-letter non-ASCII names yet? Spelled-out like Lambda and Omega
don't count.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of


On 04/07/2016 03:30, Steven D'Aprano wrote:

On Mon, 4 Jul 2016 10:17 am, BartC wrote:


On 04/07/2016 01:00, Lawrence D’Oliveiro wrote:

On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote:

Python lacks a mechanism to add user-defined operators. (R has this
capability.) Maybe this feature could be added.


That would be neat. But remember, you would have to define the operator
precedence as well. So you could no longer use a recursive-descent
parser.


That wouldn't be a problem provided the new operator symbol and its
precedence is known at a compile time, and defined before use.


You're still having problems with the whole Python-as-a-dynamic-language
thing, aren't you? :-)


Well it isn't completely dynamic, not unless code only exists as a eval 
or exec argument string (and even there, any changes will only be seen 
on calling eval or exec again on the same string).


Most Pythons seem to pre-compile code before executing the result. That 
pre-compilation requires that operators and precedences are known in 
advance and the resulting instructions are then hard-coded before execution.



In full generality, you would want to be able to define unary prefix, unary
suffix and binary infix operators, and set their precedence and whether
they associate to the left or the right. That's probably a bit much to
expect.


No, that's all possible. Maybe that's even how some language 
implementations work, defining all the set of standard operators at the 
start.



But if we limit ourselves to the boring case of binary infix operators of a
single precedence and associtivity, there's a simple approach: the parser
can allow any unicode code point of category "Sm" as a legal operator, e.g.
x ∇ y. Pre-defined operators like + - * etc continue to call the same
dunder methods they already do, but anything else tries calling:

x.__oper__('∇', y)
y.__roper__('∇', x)

and if neither of those exist and return a result other than NotImplemented,
then finally raise a runtime TypeError('undefined operator ∇').


A simpler approach is to treat user-defined operators as aliases for 
functions:


def myadd(a,b):
return a+b

operator ∇:
   (myadd,2,+3)   # map to myadd, 2 operands, prio 3, LTR

x = y ∇ z

is then equivalent to:

x = myadd(y,z)

However you will usually want to be able overload the same operator for 
different operand types. That means mapping the operator to one of 
several methods. Maybe even allowing the operator to have either one or 
two operands.


Trickier but still doable I think.

--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Marko Rauhamaa

Lawrence D’Oliveiro :

> Mathematics uses single-character variable names so that
> multiplication can be implicit.

I don't think anybody developed mathematical notation systematically.
Rather, over the centuries, various masters came up with personal
abbreviations and shorthand, which spread among admirers and students
through emulation. The resulting two-dimensional hodgepodge needs to be
supplemented by much natural-language handwaving. Rigorous treatment
needs to use a formal language, eg: http://us.metamath.org/mpeuni/evlslem2.html>.

Anyway, most programming has little use for mathematics. Thus, a
general-purpose programming language shouldn't bend over backwards to
placate that particular application domain.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Lawrence D’Oliveiro

On Monday, July 4, 2016 at 6:08:51 PM UTC+12, Jussi Piitulainen wrote:
> Something could be done, but if the intention is to allow
> mathematical notation, it needs to be done with care.

Mathematics uses single-character variable names so that multiplication can be 
implicit.

An old, stillborn language design from the 1960s called CPL* had two syntaxes 
for variable names:
* a single lowercase letter, optionally followed by any number of primes “'”;
* an uppercase letter followed by letters or digits.

It also allowed implicit multiplication; single-letter identifiers could be run 
together without spaces, but multi-character ones needed to be delimited by 
spaces or non-identifier characters. E.g.

  Sqrt(bb - 4ac)
  Area ≡ Length Width

*It was never fully implemented, but a cut-down derivative named BCPL did get 
some use. Some researchers at Bell Labs took it as their starting point, first 
creating a language called “B”, then another one called “C” ... well, the rest 
is history.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Jussi Piitulainen

Rustom Mody writes:

> Subscripts OTOH as part of identifier-lexemes doesn't seem to have any
> issues

They have the general issue that one might *want* them interpreted as
indexes, so that a₁ would mean the same as a[1].

Mathematical symbols face similar issues. One would not *want* them all
be binary operators; a specific level of precedence would not be good
for all uses; and some uses of some symbols need chaining and then
parentheses do not help. Just for the starters.

> My main point being unicode gives a wide repertory -- thats good
> It also gives char-classification -- thats a start
> But its not enough for designing a (modern) programming

So I agree. Something could be done, but if the intention is to allow
mathematical notation, it needs to be done with care.

(And no, I'm not saying Python needs to do anything at this time, and I
do not express any opinion on how likely Python is to do anything about
Unicode math at this time or ever, and so on. Just that I would not be
happy to have all those symbols available in a way that is not usable
for the intended purpose so please do take care.)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Rustom Mody

On Monday, July 4, 2016 at 8:03:47 AM UTC+5:30, Steven D'Aprano wrote:
> On Mon, 4 Jul 2016 07:28 am, Lawrence D’Oliveiro wrote:
> 
> > On Monday, July 4, 2016 at 6:39:45 AM UTC+12, John Ladasky wrote:
> >> Here's another worm for the can.  Would you rather read this...
> >> 
> >> d = sqrt(x**2 + y**2)
> >> 
> >> ...or this?
> >> 
> >> d = √(x² + y²)
> > 
> > Neither. I would rather see
> > 
> > d = math.hypot(x, y)
> > 
> > Much simpler, don’t you think?
> 
> Only if you think of x and y as the sides of a triangle, and remember
> that "hypot" is a Unix-like abbreviation for hypotenuse (rather than,
> say, "hypothesis". And it doesn't help you one bit when it comes to:
> 
> a = √(4x²y - 3xy² + 2xy - 1)

In math typically one would write

a = √4x²y - 3xy² + 2xy - 1

with the radical sign running along upto and slightly beyond the 1

My unicode prowess is not upto doing that
Though experts may be able to use macrons/overlines 

> 
> 
> Personally, I'm not convinced about using the very limited number of
> superscript code points to represent exponentiation. Using √ as an unary
> operator looks cute, but I don't know that it adds enough to the language
> to justify the addition.

I guess I am more or less in agreement (on THIS/THESE)
ie √ and superscripts is probably not worth the headache

Subscripts OTOH as part of identifier-lexemes doesn't seem to have any issues

Python3

 >>> a₁ = 1
  File "", line 1
a₁ = 1
 ^
SyntaxError: invalid character in identifier

Haskell already has it

Prelude>  let a₁ = 1
Prelude>  a₁
1
Prelude> 

Haskell allows the same for superscripts:

Prelude> let a¹ = 1
Prelude> a¹
1

which is probably not such a great idea!
Prelude>  a¹ +   a₁
2
Prelude> 

My main point being unicode gives a wide repertory -- thats good
It also gives char-classification -- thats a start
But its not enough for designing a (modern) programming

Of course one can stay with ASCII
Like "There are many ways to skin a cat"
the modern version would be "There are many ways to be a Luddite"
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Steven D'Aprano

On Mon, 4 Jul 2016 07:28 am, Lawrence D’Oliveiro wrote:

> On Monday, July 4, 2016 at 6:39:45 AM UTC+12, John Ladasky wrote:
>> Here's another worm for the can.  Would you rather read this...
>> 
>> d = sqrt(x**2 + y**2)
>> 
>> ...or this?
>> 
>> d = √(x² + y²)
> 
> Neither. I would rather see
> 
> d = math.hypot(x, y)
> 
> Much simpler, don’t you think?

Only if you think of x and y as the sides of a triangle, and remember
that "hypot" is a Unix-like abbreviation for hypotenuse (rather than,
say, "hypothesis". And it doesn't help you one bit when it comes to:

a = √(4x²y - 3xy² + 2xy - 1)

Personally, I'm not convinced about using the very limited number of
superscript code points to represent exponentiation. Using √ as an unary
operator looks cute, but I don't know that it adds enough to the language
to justify the addition.

-- 
Steven
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Steven D'Aprano

On Mon, 4 Jul 2016 10:17 am, BartC wrote:

> On 04/07/2016 01:00, Lawrence D’Oliveiro wrote:
>> On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote:
>>> Python lacks a mechanism to add user-defined operators. (R has this
>>> capability.) Maybe this feature could be added.
>>
>> That would be neat. But remember, you would have to define the operator
>> precedence as well. So you could no longer use a recursive-descent
>> parser.
> 
> That wouldn't be a problem provided the new operator symbol and its
> precedence is known at a compile time, and defined before use.

You're still having problems with the whole Python-as-a-dynamic-language
thing, aren't you? :-)

In full generality, you would want to be able to define unary prefix, unary
suffix and binary infix operators, and set their precedence and whether
they associate to the left or the right. That's probably a bit much to
expect.

But if we limit ourselves to the boring case of binary infix operators of a
single precedence and associtivity, there's a simple approach: the parser
can allow any unicode code point of category "Sm" as a legal operator, e.g.
x ∇ y. Pre-defined operators like + - * etc continue to call the same
dunder methods they already do, but anything else tries calling:

x.__oper__('∇', y)
y.__roper__('∇', x)

and if neither of those exist and return a result other than NotImplemented,
then finally raise a runtime TypeError('undefined operator ∇').

But I don't think this will ever be part of Python.

-- 
Steven
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Random832

On Sun, Jul 3, 2016, at 21:15, Lawrence D’Oliveiro wrote:
> On Monday, July 4, 2016 at 12:40:14 PM UTC+12, BartC wrote:
> > The structure of such a parser doesn't need to exactly match the grammar 
> > with a dedicated block of code for each operator precedence. It can be 
> > table-driven so that an operator precedence value is just an attribute.
> 
> Of course. But that’s not a recursive-descent parser any more.

It's still recursive descent if it, for example, calls the _same_ block
of code recursively with arguments to tell it which operator is being
considered. This would be analogous to, in Python, implementing a
recursive-descent parser with arbitrary callable objects instead of
simple functions.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Random832

On Sun, Jul 3, 2016, at 20:00, Lawrence D’Oliveiro wrote:
> That would be neat. But remember, you would have to define the operator
> precedence as well. So you could no longer use a recursive-descent
> parser.

You could use a recursive-descent parser if you monkey-patch the parser
when adding a new operator.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

On Monday, July 4, 2016 at 12:40:14 PM UTC+12, BartC wrote:
> The structure of such a parser doesn't need to exactly match the grammar 
> with a dedicated block of code for each operator precedence. It can be 
> table-driven so that an operator precedence value is just an attribute.

Of course. But that’s not a recursive-descent parser any more.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread BartC


On 04/07/2016 01:24, Lawrence D’Oliveiro wrote:

On Monday, July 4, 2016 at 12:17:47 PM UTC+12, BartC wrote:


On 04/07/2016 01:00, Lawrence D’Oliveiro wrote:


On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote:


Python lacks a mechanism to add user-defined operators. (R has this
capability.) Maybe this feature could be added.


That would be neat. But remember, you would have to define the operator
precedence as well. So you could no longer use a recursive-descent parser.


That wouldn't be a problem provided the new operator symbol and its
precedence is known at a compile time, and defined before use.


That is how it is normally done. (E.g. Algol 68.)

But you still couldn’t use a recursive-descent parser.


Why not?

The structure of such a parser doesn't need to exactly match the grammar 
with a dedicated block of code for each operator precedence. It can be 
table-driven so that an operator precedence value is just an attribute.


--
Bartc



--
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

On Monday, July 4, 2016 at 12:17:47 PM UTC+12, BartC wrote:
>
> On 04/07/2016 01:00, Lawrence D’Oliveiro wrote:
>>
>> On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote:
>>>
>>> Python lacks a mechanism to add user-defined operators. (R has this
>>> capability.) Maybe this feature could be added.
>>
>> That would be neat. But remember, you would have to define the operator
>> precedence as well. So you could no longer use a recursive-descent parser.
> 
> That wouldn't be a problem provided the new operator symbol and its 
> precedence is known at a compile time, and defined before use.

That is how it is normally done. (E.g. Algol 68.)

But you still couldn’t use a recursive-descent parser.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread BartC


On 04/07/2016 01:00, Lawrence D’Oliveiro wrote:

On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote:

Python lacks a mechanism to add user-defined operators. (R has this
capability.) Maybe this feature could be added.


That would be neat. But remember, you would have to define the operator 
precedence as well. So you could no longer use a recursive-descent parser.


That wouldn't be a problem provided the new operator symbol and its 
precedence is known at a compile time, and defined before use.



--
Bartc

--
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote:
> Python lacks a mechanism to add user-defined operators. (R has this
> capability.) Maybe this feature could be added.

That would be neat. But remember, you would have to define the operator 
precedence as well. So you could no longer use a recursive-descent parser.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread eryk sun

On Sun, Jul 3, 2016 at 6:58 AM, John Ladasky  wrote:
> The nabla symbol (∇) is used in the naming of gradients. Python isn't having 
> it.
> The interpreter throws a "SyntaxError: invalid character in identifier" when 
> it
> encounters the ∇.

Del is a mathematical operator to take the gradient. It's not part of
the name. For `∇f`, the operator is `∇` and the function name is `f`.
Python lacks a mechanism to add user-defined operators. (R has this
capability.) Maybe this feature could be added. To make parsing
simple, user-defined operators could be limited to non-ASCII symbol
characters (math and other -- Sm, So). That simple option is off the
table if we allow symbol characters in names.

Adding an operator to the language itself requires a PEP. Recently PEP
465 added an `@` operator for matrix products. For example:

>>> x = np.array([1j, 1])
>>> x @ x
0j
>>> x @ x.conj() # Hermitian inner product
(2+0j)

Note that using a non-ASCII operator was ruled out:

http://legacy.python.org/dev/peps/pep-0465/#choice-of-operator
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

On Monday, July 4, 2016 at 6:39:45 AM UTC+12, John Ladasky wrote:
> Here's another worm for the can.  Would you rather read this...
> 
> d = sqrt(x**2 + y**2)
> 
> ...or this?
> 
> d = √(x² + y²)

Neither. I would rather see

d = math.hypot(x, y)

Much simpler, don’t you think?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

Random832 :

> Being able to put any character in a symbol doesn't make those strings
> identifiers, any more than passing them to getattr/setattr (or
> __import__, something's __name__, etc) does in Python.

From R7RS, the newest Scheme standard (p. 61-62):

 7.1.1. Lexical structure
 [...]
 〈vertical line〉 → |
 [...]
 〈identifier〉 → 〈initial〉 〈subsequent〉*
  | 〈vertical line〉 〈symbol element〉* 〈vertical line〉
  | 〈peculiar identifier〉
 〈initial〉 → 〈letter〉 | 〈special initial〉
 〈letter〉 → a | b | c | ... | z
 | A | B | C | ... | Z
 〈special initial〉 → ! | $ | % | & | * | / | : | < | =
 | > | ? | ^ | _ | ~
 〈subsequent〉 → 〈initial〉 | 〈digit〉
 | 〈special subsequent〉
 〈digit〉 → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
 〈hex digit〉 → 〈digit〉 | a | b | c | d | e | f
 〈explicit sign〉 → + | -
 〈special subsequent〉 → 〈explicit sign〉 | . | @
 〈inline hex escape〉 → \x〈hex scalar value〉;
 〈hex scalar value〉 → 〈hex digit〉 +
 〈mnemonic escape〉 → \a | \b | \t | \n | \r
 〈peculiar identifier〉 → 〈explicit sign〉
 | 〈explicit sign〉 〈sign subsequent〉 〈subsequent〉*
 | 〈explicit sign〉 . 〈dot subsequent〉 〈subsequent〉*
 | . 〈dot subsequent〉 〈subsequent〉*
 〈dot subsequent〉 → 〈sign subsequent〉 | .
 〈sign subsequent〉 → 〈initial〉 | 〈explicit sign〉 | @
 〈symbol element〉 →
 〈any character other than 〈vertical line〉or \〉
 | 〈inline hex escape〉 | 〈mnemonic escape〉 | \|


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

On Sunday, July 3, 2016 at 11:50:52 PM UTC+12, BartC wrote:
> Otherwise you can be looking at:
> 
>a b c d e f g h
> 
> (not Scheme) and wondering which are names and which are operators.

I did a language design for my MSc thesis where all “functions” were operators. 
So a construct like “f(a, b, c)” was really a monadic operator “f” followed by 
a single argument, a record constructor “(a, b, c)”.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

On Sunday, July 3, 2016 at 9:02:05 PM UTC+12, Marko Rauhamaa wrote:
> Lawrence D’Oliveiro:
> 
>> On Sunday, July 3, 2016 at 7:27:04 PM UTC+12, Marko Rauhamaa wrote:
>>
>>> Personally, I don't think even π should be used in identifiers.
>>
> > Why not?
> 
> 1. It can't be typed easily.

I have a custom .XCompose, so it’s just “compose-p-i”. Easy to type, easy to 
remember.

> 2. It can look like an n.

Only to someone accustomed to using just one alphabet. :)

> 3. Single-character identifiers should not be promoted, especially with
>a global scope.

It’s no more “global” than “math.e”. And what about “1j”? (That completes the 
triumvirate of single-letter names from the Euler identity.)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Random832

On Sun, Jul 3, 2016, at 07:22, Marko Rauhamaa wrote:
> Christian Gollwitzer :
> > Am 03.07.16 um 13:01 schrieb Marko Rauhamaa:
> >> Scheme allows *any* characters whatsoever in identifiers.
> >
> > Parentheses?
> 
> Yes.
> 
> Hint: Python allows *any* characters whatsoever in strings.

Being able to put any character in a symbol doesn't make those strings
identifiers, any more than passing them to getattr/setattr (or
__import__, something's __name__, etc) does in Python.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread MRAB

On 2016-07-03 19:39, John Ladasky wrote:

On Sunday, July 3, 2016 at 12:42:14 AM UTC-7, Chris Angelico wrote:

On Sun, Jul 3, 2016 at 4:58 PM, John Ladasky wrote:

Very good question! The detaily answer is here:

https://docs.python.org/3/reference/lexical_analysis.html#identifiers

> A philosophical question. Why should any character be excluded from a
variable name, besides the fact that it might also be an operator?

In a way, that's exactly what's happening here. Python permits certain
categories of character as identifiers, leaving other categories
available for operators. Even though there aren't any non-ASCII
operators in a vanilla CPython, it's plausible that someone could
create a Python-based language with more operators (eg ≠ NOT EQUAL TO
as an alternative to !=), and I'm sure you'd agree that saying "≠ = 1"
is nonsensical.

I agree that there are some characters in the Unicode definition that could (should?) be operators and, as such,
disallowed in identifiers. "≠", "≥" and "√" come to mind. I don't know whether the
Unicode "character properties" are assigned to the characters in a way that would be satisfying to the needs
of programmers. I'll do some reading.

Symbols like that are a bit of a
grey area, so you may find that you're starting a huge debate :)

Oh, I can see that debate coming. I know that not all of these characters are
easily TYPED, and so I have to reach for a Unicode table to cut and paste them.
But once but and pasted, they are easily READ, and that's a big plus.

Here's another worm for the can. Would you rather read this...

d = sqrt(x**2 + y**2)

...or this?

d = √(x² + y²)

It's easy to read something as simple like that, but it's harder when
the exponent is more than a number or a variable. And what about a**b**c?

Not to mention the limited number of superscript codepoints available...
--
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread John Ladasky

On Sunday, July 3, 2016 at 12:42:14 AM UTC-7, Chris Angelico wrote:
> On Sun, Jul 3, 2016 at 4:58 PM, John Ladasky wrote:

> Very good question! The detaily answer is here:
> 
> https://docs.python.org/3/reference/lexical_analysis.html#identifiers
> 
> > A philosophical question.  Why should any character be excluded from a 
> > variable name, besides the fact that it might also be an operator?
> 
> In a way, that's exactly what's happening here. Python permits certain
> categories of character as identifiers, leaving other categories
> available for operators. Even though there aren't any non-ASCII
> operators in a vanilla CPython, it's plausible that someone could
> create a Python-based language with more operators (eg ≠ NOT EQUAL TO
> as an alternative to !=), and I'm sure you'd agree that saying "≠ = 1"
> is nonsensical.

I agree that there are some characters in the Unicode definition that could 
(should?) be operators and, as such, disallowed in identifiers.  "≠", "≥" and 
"√" come to mind.  I don't know whether the Unicode "character properties" are 
assigned to the characters in a way that would be satisfying to the needs of 
programmers.  I'll do some reading.

> Symbols like that are a bit of a
> grey area, so you may find that you're starting a huge debate :)

Oh, I can see that debate coming.  I know that not all of these characters are 
easily TYPED, and so I have to reach for a Unicode table to cut and paste them. 
 But once but and pasted, they are easily READ, and that's a big plus.

Here's another worm for the can.  Would you rather read this...

d = sqrt(x**2 + y**2)

...or this?

d = √(x² + y²)

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread John Ladasky

Lawrence, I trust you understand that I didn't post a complete working program, 
just a few lines showing the intended usage?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Chris Angelico

On Sun, Jul 3, 2016 at 7:01 PM, Marko Rauhamaa  wrote:
> Lawrence D’Oliveiro :
>
>> On Sunday, July 3, 2016 at 7:27:04 PM UTC+12, Marko Rauhamaa wrote:
>>
>>> Personally, I don't think even π should be used in identifiers.
>>
>> Why not?
>
> 1. It can't be typed easily.
>
> 2. It can look like an n.
>
> 3. Single-character identifiers should not be promoted, especially with
>a global scope.

None of these is a language-level concern. You can't type it? That's
your problem - and you can choose not to use it. But Python lets you,
if you want to. Remember, some people speak Greek natively, and for
those people, typing Greek text is as natural as typing Latin text is
for us. Similarly, Cyrillic text is the most natural language for
Russian speakers. Why should Python block them?

Your other concerns might be a case for linters, but definitely not
the language.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

Christian Gollwitzer :

> Am 03.07.16 um 13:22 schrieb Marko Rauhamaa:
>> Christian Gollwitzer :
>>> Am 03.07.16 um 13:01 schrieb Marko Rauhamaa:
 Scheme allows *any* characters whatsoever in identifiers.
>>> Parentheses?
>> Yes.
>
> My knowledge of Scheme is rusty. How do you do that?

   Moreover, all characters whose Unicode scalar values are greater than
   127 and whose Unicode category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
   Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co can be used within
   identifiers. In addition, any character can be used within an
   identifier when specified via an . For example,
   the identifier H\x65;llo is the same as the identifier Hello, and the
   identifier \x3BB; is the same as the identifier λ.
   http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4>

Guile doesn't support the R6RS inline hex escape notation. Instead, it
natively supports a notation of its own:

   #{foo bar}#

   #{what
   ever}#

   #{4242}#

Or the R7RS notation:

   |foo bar|
   |\x3BB; is a greek lambda|
   |\| is a vertical bar|

   https://www.gnu.org/software/guile/manual/html_node/Symbol-Rea
   d-Syntax.html#index-r7rs_002dsymbols>


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Christian Gollwitzer


Am 03.07.16 um 13:22 schrieb Marko Rauhamaa:

Christian Gollwitzer :


Am 03.07.16 um 13:01 schrieb Marko Rauhamaa:

Alain Ketterlin :


It would be very confusing to have a variable named ∇f, as confusing
as naming a variable a+b or √x.


Scheme allows *any* characters whatsoever in identifiers.


Parentheses?


Yes.

Hint: Python allows *any* characters whatsoever in strings.


My knowledge of Scheme is rusty. How do you do that? Consider

(define x 'hello)

then the x is the identifier, isn't it? How can you include a 
metacharacter like space, ', or ( in it? I'm using 
https://repl.it/languages/scheme to try it out.


Another language which allows any characters in identifiers is Tcl. Here 
you can quote identifiers:


set {a b} c

creates a variable "a b" with a space in it, because there is no 
distinction between quoted/unquoted. Metacharacters can be included by 
\-escapes. How does that work in Scheme?


Christian

--
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread BartC


On 03/07/2016 12:01, Marko Rauhamaa wrote:

Alain Ketterlin :


It would be very confusing to have a variable named ∇f, as confusing
as naming a variable a+b or √x.


Scheme allows *any* characters whatsoever in identifiers.


I think it's one of those languages that has already dispensed with most 
syntax anyway. Including distinctions between names and symbols.


Some people think that extra syntax rules including enforcing such 
distinctions and having restrictions can improve readability. Otherwise 
you can be looking at:


  a b c d e f g h

(not Scheme) and wondering which are names and which are operators.

--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

Christian Gollwitzer :

> Am 03.07.16 um 13:01 schrieb Marko Rauhamaa:
>> Alain Ketterlin :
>>
>>> It would be very confusing to have a variable named ∇f, as confusing
>>> as naming a variable a+b or √x.
>>
>> Scheme allows *any* characters whatsoever in identifiers.
>
> Parentheses?

Yes.

Hint: Python allows *any* characters whatsoever in strings.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Christian Gollwitzer


Am 03.07.16 um 13:01 schrieb Marko Rauhamaa:

Alain Ketterlin :


It would be very confusing to have a variable named ∇f, as confusing
as naming a variable a+b or √x.


Scheme allows *any* characters whatsoever in identifiers.


Parentheses?

Christian

--
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

Alain Ketterlin :

> It would be very confusing to have a variable named ∇f, as confusing
> as naming a variable a+b or √x.

Scheme allows *any* characters whatsoever in identifiers.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Alain Ketterlin

John Ladasky  writes:

> from math import pi as π
> [...]
> c = 2 * π * r

> Up until today, every character I've tried has been accepted by the
> Python interpreter as a legitimate character for inclusion in a
> variable name. Now I'm copying a formula which defines a gradient. The
> nabla symbol (∇) is used in the naming of gradients. Python isn't
> having it. The interpreter throws a "SyntaxError: invalid character in
> identifier" when it encounters the ∇.

The rules are at
https://docs.python.org/3.5/reference/lexical_analysis.html#identifiers

To me it makes a lot of sense to *not* include category Sm characters in
identifiers, since they are usually used to denote operators (like +).
It would be very confusing to have a variable named ∇f, as confusing as
naming a variable a+b or √x.

-- Alain.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Robert Kern


On 2016-07-03 08:29, Jussi Piitulainen wrote:

(Hm. Python seems to understand that the character occurs in what is
intended to be an identifier. Perhaps that's a default error message.)


I suspect that "identifier" is the final catch-all token in the lexer. Comments 
and strings are clearly delimited. Keywords, operators, and [{(braces)}] are all 
explicitly whitelisted from finite lists. Well, I guess it could have been 
intended by the user to be a numerical literal, but I suspect that's attempted 
before identifier.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

Lawrence D’Oliveiro :

> On Sunday, July 3, 2016 at 7:27:04 PM UTC+12, Marko Rauhamaa wrote:
>
>> Personally, I don't think even π should be used in identifiers.
>
> Why not?

1. It can't be typed easily.

2. It can look like an n.

3. Single-character identifiers should not be promoted, especially with
   a global scope.

> Python already has all the other single-character constants in what
> probably the most fundamental identity in all of mathematics:
>
> $$e^{i \pi} + 1 = 0$$

Mathematics and physics have run into trouble with single-character
identifiers already. They have run out of letters and have had to reuse
them. Programmers used to have the same problem until they realized it's
ok to use descriptive names.

Just say,

>>> import cmath
>>> cmath.e ** (1j * cmath.pi) + 1
1.2246467991473532e-16j

Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

On Sunday, July 3, 2016 at 7:27:04 PM UTC+12, Marko Rauhamaa wrote:

> Personally, I don't think even π should be used in identifiers.

Why not? Python already has all the other single-character constants in what 
probably the most fundamental identity in all of mathematics:

$$e^{i \pi} + 1 = 0$$
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Chris Angelico

On Sun, Jul 3, 2016 at 4:58 PM, John Ladasky  wrote:
> Up until today, every character I've tried has been accepted by the Python 
> interpreter as a legitimate character for inclusion in a variable name.  Now 
> I'm copying a formula which defines a gradient.  The nabla symbol (∇) is used 
> in the naming of gradients.  Python isn't having it.  The interpreter throws 
> a "SyntaxError: invalid character in identifier" when it encounters the ∇.
>
> I am now wondering what constitutes a valid character for an identifier, and 
> how they were chosen.  Obviously, the Western alphabet and standard Greek 
> letters work.  I just tried a few very weird characters from the Latin 
> Extended range, and some Cyrillic characters.  These are also fine.
>

Very good question! The detaily answer is here:

https://docs.python.org/3/reference/lexical_analysis.html#identifiers

> A philosophical question.  Why should any character be excluded from a 
> variable name, besides the fact that it might also be an operator?
>

In a way, that's exactly what's happening here. Python permits certain
categories of character as identifiers, leaving other categories
available for operators. Even though there aren't any non-ASCII
operators in a vanilla CPython, it's plausible that someone could
create a Python-based language with more operators (eg ≠ NOT EQUAL TO
as an alternative to !=), and I'm sure you'd agree that saying "≠ = 1"
is nonsensical.

> This might be a problem I can solve, I'm not sure.  Is there a file that the 
> Python interpreter refers to which defines the accepted variable name 
> characters?  Perhaps I could just add ∇.
>

The key here is its Unicode category:

>>> unicodedata.category("∇")
'Sm'

You could probably hack CPython to include Sm, and maybe Sc, Sk, and
So, as valid identifier characters. I'm not sure where, though, and
I've just spent a good bit of time delving (it's based on the
XID_Start and XID_Continue derived properties, but I have no idea
where they're defined - Tools/unicode/makeunicodedata.py looks
promising, but even there, I can't find it). And - or maybe instead -
you could appeal to the core devs to have the category/ies in question
added to the official Python spec. Symbols like that are a bit of a
grey area, so you may find that you're starting a huge debate :)

Have fun.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Rustom Mody

On Sunday, July 3, 2016 at 12:29:14 PM UTC+5:30, John Ladasky wrote:
> A while back, I shared my love for using Greek letters as variable names in 
> my Python (3.4) code -- when, and only when, they are warranted for improved 
> readability.  For example, I like to see the following:
> 
> 
> from math import pi as π
> 
> c = 2 * π * r
> 
> 
> When I am copying mathematical formulas from publications, and Greek letters 
> are used in that publication, I prefer to follow the text exactly as written.
> 
> Up until today, every character I've tried has been accepted by the Python 
> interpreter as a legitimate character for inclusion in a variable name.  Now 
> I'm copying a formula which defines a gradient.  The nabla symbol (∇) is used 
> in the naming of gradients.  Python isn't having it.  The interpreter throws 
> a "SyntaxError: invalid character in identifier" when it encounters the ∇.
> 
> I am now wondering what constitutes a valid character for an identifier, and 
> how they were chosen.  Obviously, the Western alphabet and standard Greek 
> letters work.  I just tried a few very weird characters from the Latin 
> Extended range, and some Cyrillic characters.  These are also fine.

https://docs.python.org/3.5/reference/lexical_analysis.html
points to
https://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html

Quite hardwired

> 
> A philosophical question.  Why should any character be excluded from a 
> variable name, besides the fact that it might also be an operator?
> 
> This might be a problem I can solve, I'm not sure.  Is there a file that the 
> Python interpreter refers to which defines the accepted variable name 
> characters?  Perhaps I could just add ∇.

You need to try something like

>>> import unicodedata as ud
>>> ud.category("∇")
'Sm'
>>> ud.category("A")
'Lu'
>>> ud.category("π")
'Ll'
>>> ud.category("a")
'Ll'

followed by figuring out why/what etc from (say)
https://en.wikipedia.org/wiki/Unicode_character_property

This is the way it IS
Not saying it SHOULD BE…
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

Lawrence D’Oliveiro :

> It wasn’t the “π” it was complaining about...

The question is why π is accepted but ∇ is not.

The immediate reason is that π is a letter while ∇ is not. But the
question, then, is why bother excluding nonletters from identifiers.

Personally, I don't think even π should be used in identifiers.
Mathematicians and physicists have a questionable tradition of using
single-character identifiers in their formulas. That shouldn't be
transported to programming.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Jussi Piitulainen

John Ladasky writes:

[- -]

> The nabla symbol (∇) is used in the naming of gradients.  Python isn't
> having it.  The interpreter throws a "SyntaxError: invalid character
> in identifier" when it encounters the ∇.
>
> I am now wondering what constitutes a valid character for an
> identifier, and how they were chosen.  Obviously, the Western alphabet
> and standard Greek letters work.  I just tried a few very weird
> characters from the Latin Extended range, and some Cyrillic
> characters.  These are also fine.

I think they merely extended the identifier syntax to Unicode: one or
more letters, underscores and digits, not starting with a digit. The
nabla symbol is not classified as a letter in Unicode, so it's not
allowed under this rule, and there is no other rule to allow it.

(Hm. Python seems to understand that the character occurs in what is
intended to be an identifier. Perhaps that's a default error message.)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of