Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-29 Thread Martin v. Löwis
Atsuo Ishimoto wrote:
 I'm +0.1 for non-ASCII identifiers, although module names should remain
 ASCII. ASCII identifiers might be encouraged, but as Martin said, it is
 very useful for some groups of users.

Thanks for these data. This mostly reflects my experience with German
and French users: some people would like to use non-ASCII identifiers
if they could, other argue they never would as a matter of principle.
Of course, transliteration is more straight-forward.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-29 Thread Gustavo J. A. M. Carneiro
On Sat, 2005-10-29 at 10:56 +0200, Martin v. Löwis wrote:
 Atsuo Ishimoto wrote:
  I'm +0.1 for non-ASCII identifiers, although module names should remain
  ASCII. ASCII identifiers might be encouraged, but as Martin said, it is
  very useful for some groups of users.
 
 Thanks for these data. This mostly reflects my experience with German
 and French users: some people would like to use non-ASCII identifiers
 if they could, other argue they never would as a matter of principle.
 Of course, transliteration is more straight-forward.

  Not sure if anyone has made this point already, but unicode
identifiers are also useful for math programs.  The ability to directly
type the math letters, like alpha, omega, etc., would actually make the
code more readable, while still understandable by programmers of all
nationalities.  For instance, you could write:

Δv = x1 - x0
if Δv  ε:
return

Instead of:

delta_v = x1 - x0
if delta_v  epsilon:
return

But anyone that is supposed to understand the code will be able to read
the delta and epsilon symbols.

  Regards.

-- 
Gustavo J. A. M. Carneiro
[EMAIL PROTECTED] [EMAIL PROTECTED]
The universe is always one step beyond logic

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-29 Thread Antoine Pitrou

 Thanks for these data. This mostly reflects my experience with German
 and French users: some people would like to use non-ASCII identifiers
 if they could, other argue they never would as a matter of principle.
 Of course, transliteration is more straight-forward.

FWIW, being French, I don't remember hearing any programmer wish (s)he
could use non-ASCII identifiers, in any programming language. But
arguably translitteration is very straight-forward (although a bit
lossless at times ;-)).

I think typeability and reproduceability should be weighted carefully.
It's nice to have the real letter delta instead of delta, but how do I
type it again on my non-Greek keyboard if I want to keep consistent
naming in the program?

ASCII is ethnocentric, but it probably can be typed easily with every
device in the world.

Also, as a matter of fact, if I type an identifier with an accented
letter inside, I would like Python to warn me, because it would be a
typing error on my part.

Maybe this should be an option at the beginning of any source file (like
encoding currently). Or is this overkill?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-29 Thread Fabien Schwob
 FWIW, being French, I don't remember hearing any programmer wish (s)he
 could use non-ASCII identifiers, in any programming language. But
 arguably translitteration is very straight-forward (although a bit
 lossless at times ;-)).
 
 I think typeability and reproduceability should be weighted carefully.
 It's nice to have the real letter delta instead of delta, but how do I
 type it again on my non-Greek keyboard if I want to keep consistent
 naming in the program?
 
 ASCII is ethnocentric, but it probably can be typed easily with every
 device in the world.
 
 Also, as a matter of fact, if I type an identifier with an accented
 letter inside, I would like Python to warn me, because it would be a
 typing error on my part.
 
 Maybe this should be an option at the beginning of any source file (like
 encoding currently). Or is this overkill?

I'm also French and I must say that I agree with you. In my case, the 
most important thing is to be able to manage the _data_ in the good 
encoding.

I'm currently trying to implement a little search engine in python (to 
improve my skills mainly) and the biggest problem I have to face is how 
to manage encoding. Some web pages are in French, in German, in English, 
etc. and it take me a lot of time to handle this problem correctly.

I think it's more useful to be able to manipulate simply the _data_ than 
to have accents in identifiers.

-- 
Derrière chaque bogue, il y a un développeur, un homme qui s'est trompé.
(Bon, OK, parfois ils s'y mettent à plusieurs).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-29 Thread Martin v. Löwis
Antoine Pitrou wrote:
 FWIW, being French, I don't remember hearing any programmer wish (s)he
 could use non-ASCII identifiers, in any programming language. But
 arguably translitteration is very straight-forward (although a bit
 lossless at times ;-)).

My canonical example is François Pinard, who keeps requesting it,
saying that local people where surprised they couldn't use accented
characters in Python.

Perhaps that's because he actually is Quebecian :-)

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-28 Thread Oren Tirosh
On 10/28/05, Neil Hodgson [EMAIL PROTECTED] wrote:
I used to work on software written by Japanese and English speakers
 at Fujitsu with most developers being Japanese. The rules were that
 comments could be in Japanese but identifiers were only allowed to
 contain ASCII characters. Most variable names were poorly chosen with
 s, p, q, fla (boolean=flag) and flafla being popular. When I asked
 some Japanese coders why they didn't use Japanese words expressed in
 ASCII (Romaji), their response was that it was a really weird idea.

This is anecdotal but it appears to me that transliterations are
 not commonly used apart from learning languages and some minimal help
 for foreigners such as including transliterated names on railway
 station name boards.

Israeli programmers generally use English identifiers but
transliterations are common for local business terminology: types of
financial instruments, tax and insurance terminology, employee benefit
plans etc. Yes, it looks weird, but it would be rather pointless to
try to translate them. Even native English speakers would find it
difficult to recognize the translations because they are used to using
them as loan words. Only transliteration (or possibly the use of
non-ASCII identifiers) would make sense in this situation and I do not
think it is unique to Israel.

BTW, I heard about a Cobol shop that had an explicit policy of using
only transliterated identifiers. This resulted in a much smaller
chance of hitting one of Cobol's numerous reserved words. Thankfully,
this is not an issue in Python...

  Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-28 Thread Martin v. Löwis
Neil Hodgson wrote:
This is anecdotal but it appears to me that transliterations are
 not commonly used apart from learning languages and some minimal help
 for foreigners such as including transliterated names on railway
 station name boards.

That would be my guess also. Transliteration is clearly common for
Latin-based languages (French, German, Spanish, say), but I doubt
non-Latin scripts are that often transliterated (even if conventions
exist).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-28 Thread Atsuo Ishimoto
Hello from Japan,

I googled discussions about non-ASCII identifiers in Japanese, but I
found no consensus. Major languages such as Java or VB support non-ASCII
identifiers, so projects uses non-ASCII identifiers for their programs
are existing. Not all Japanese programmers think this is a good idea.
Some people enthusiastically prefer Japanese identifiers, but some feel
it reduces readability and hard to type, some worry about tool breakages
or encoding problem, etc. It looks that smart people don't like to
express their preference to Japanese identifiers, maybe because they
think such style is not cool, or they are afraid such confession may
reveal lack of their English ability.;) 

I'm +0.1 for non-ASCII identifiers, although module names should remain
ASCII. ASCII identifiers might be encouraged, but as Martin said, it is
very useful for some groups of users.


On Sat, 29 Oct 2005 00:21:03 +0200
Martin v. Lvwis [EMAIL PROTECTED] wrote:

 Neil Hodgson wrote:
 This is anecdotal but it appears to me that transliterations are
  not commonly used apart from learning languages and some minimal help
  for foreigners such as including transliterated names on railway
  station name boards.
 
 That would be my guess also. Transliteration is clearly common for
 Latin-based languages (French, German, Spanish, say), but I doubt
 non-Latin scripts are that often transliterated (even if conventions
 exist).
 

Yes, transliterations are rarely used in daily life in Japan. For
programming, I know a lot of projects use transliterated Japanses style,
but I guess they are rather minority.

--
Atsuo Ishimoto
[EMAIL PROTECTED]
Homepage:http://www.gembook.jp

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-27 Thread Josiah Carlson

Martin v. Löwis [EMAIL PROTECTED] wrote:
 Josiah Carlson wrote:
  According to wikipedia (http://en.wikipedia.org/wiki/Latin_alphabet),
  various languages have adopted a transliteration of their language
  and/or former alphabets into latin.  They don't purport to know all of
  the reasons why, and I'm not going to speculate.
  
  Whether or not more languages start using the latin alphabet is a good
  question.  Basing judgement on history and likely globalization, it is
  only a matter of time before basically all languages have a
  transcription into the latin alphabet that is taught to all (unless
  China takes over the world).
 
 That is a very U.S. centric view. I don't share it, but I think it is
 pointless to argue against it.

I should have included a ;).  Whether or not in the future all languages
use the latin alphabet should have little to do with whether Python
chooses to support non-latin identifiers in the forthcoming 2.5 or later
releases.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-27 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 
You even argued against having non-ASCII identifiers:

http://mail.python.org/pipermail/python-list/2002-May/102936.html
 
 
 I see :-) It seems I have changed my mind since then (which
 apparently predates PEP 263).
 
 One issue I apparently was worried about was the plan to use
 native-encoding byte strings for the identifiers; this I didn't
 like at all.
 
 
* Unicode identifiers are going to introduce massive
code breakage - just think of all the tools people use
to manipulate Python code today; I'm quite sure that
most of it will fail in one way or another if you present
it Unicode literals such as in zähler += 1.
 
 
 True. Today, I think I would be willing to accept the
 code breakage: these tools had quite some time to update
 themselves to PEP 263 (even though not all of them have
 done so yet); also, usage of the feature would only spread
 gradually. A failure to support the feature in the Python
 proper would be treated as a bug by us; how tool providers
 deal with the feature would be their choice.

I was thinking of introspection and debugging tools.
These would then see Unicode objects in the namespace
dictionaries and this will likely break a lot of code -
much for the same reason you see code breakage now
if you let Unicode object enter the Python standard lib
without warning :-)

* People don't seem very interested in using Unicode
identifiers, e.g.

  http://mail.python.org/pipermail/i18n-sig/2001-February/000828.html
 
 
 True. However, I also suspect that lack of tool support
 contributes to that. For the specific case of Java,
 there is no notion of source encoding, which makes Unicode
 identifiers really tedious to use.
 
 If it were really easy to use, I assume people would actually
 use it - atleast in some of the contexts, like teaching,
 where Python is also widely used.

Well, that has two sides: Of course, you'll always find
some people that will like a certain feature. The question
is what effects does it have on the rest of us.

Python has always put some constraints on programmers
to raise code readability, e.g. white space awareness.
Giving them Unicode identifiers sounds like a step
backwards in this context.

Note that I'm not talking about comments, string literal
contents, etc. - only the programming logic, ie. keywords
and identifiers.

Do you really think that it will help with code readability
if programmers are allowed to use native scripts for their
identifiers ?
 
 
 Yes, I do - for some groups of users. Of course, code sharing
 would be more difficult, and there certainly should be a policy
 to use only ASCII in the standard library. But within local
 groups, users would find understanding code easier if they
 knew what the identifiers actually meant.

Hmm, but why do you think they wouldn't understand the meaning of
ASCII versions of the identifiers ?

Note that using ASCII doesn't necessarily mean that you
have to use English as basis for the naming schemes of
identifiers.

If you are told to debug a program
written by say a Japanese programmer using Japanese identifiers
you are going to have a really hard time. Integrating such
code into other applications will be even harder, since you'd
be forced to use his Japanese class names in your application.
 
 
 Certainly, yes. There is a trade-off: you can make it easier
 for some people to read and write code if they can use their
 native script; at the same time, it would be harder for others
 to read and modify it.
 
 It's a policy decision whether you use English identifiers or
 not - it shouldn't be a technical decision (as it currently
 is).

See above: ASCII != English. Most scripts have a transliteration
into ASCII - simply because that's the global standard for
scripts.

I think source code encodings provide an ideal way to
have comments written in native scripts - and people
use that a lot. However, keeping the program code itself
in plain ASCII makes it far more readable and reusable
across locales. Something that's important in this
globalized world.
 
 
 Certainly. However, some programs don't need to live in
 a globalized world - e.g. if they are homework in a school.
 Within a locale, using native scripts would make the program
 more readable.

True.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 27 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-27 Thread M.-A. Lemburg
Greg Ewing wrote:
 M.-A. Lemburg wrote:
 
 
If you are told to debug a program
written by say a Japanese programmer using Japanese identifiers
you are going to have a really hard time.
 
 
 Or you could look upon it as an opportunity to
 broaden your mental horizons by learning some
 Japanese. :-)

I just took Japanese as exmaple for a language and script
that I don't know anything about. I would actually love
to learn some Japanese, but simply don't have the time
for learning it.

Anyway, I could just as well have chosen Tibetian, Thai or Limbu
scripts (which all look very nice, BTW):

http://www.unicode.org/charts/

Perhaps this is not as bad after all - I just don't think that
it will help code readability in the long run.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 27 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-27 Thread Martin v. Löwis
Greg Ewing wrote:
 I still think this is a much worse potential problem
 than that of l vs 1, etc. It's reasonable to
 adopt the practice of never using l as a single
 letter identifier, for example. But it would be
 unreasonable to ban the use of E as an identifier
 on the grounds that someone somewhere might confuse
 it with a capital epsilon.

As a style guide, people should use single-letter
identifiers only for local variables. If they follow
the guideline, it should be easy to tell whether
such an identifier is Latin or Greek (if everything
else in the function is Latin, the E likely is as
well).

 An alternative would be to identify such confusable
 letters in the various alphabets and define them
 to be equivalent.

pylint could check for such things (although I very
much doubt it would have any hits in the next 10
years).

 And beyond the issue of alphabets there's also the
 question of whether accented characters should be
 considered distinct. I can see quite a few holy
 flame wars erupting over that...

For that, there is the Unicode TR that precisely
defines how this should be done. People should then
have their wars with the Unicode consortium.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-27 Thread Neil Hodgson
Josiah Carlson:

 According to wikipedia (http://en.wikipedia.org/wiki/Latin_alphabet),
 various languages have adopted a transliteration of their language
 and/or former alphabets into latin.  They don't purport to know all of
 the reasons why, and I'm not going to speculate.

   I used to work on software written by Japanese and English speakers
at Fujitsu with most developers being Japanese. The rules were that
comments could be in Japanese but identifiers were only allowed to
contain ASCII characters. Most variable names were poorly chosen with
s, p, q, fla (boolean=flag) and flafla being popular. When I asked
some Japanese coders why they didn't use Japanese words expressed in
ASCII (Romaji), their response was that it was a really weird idea.

   This is anecdotal but it appears to me that transliterations are
not commonly used apart from learning languages and some minimal help
for foreigners such as including transliterated names on railway
station name boards.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-26 Thread Martin v. Löwis
Josiah Carlson wrote:
 In this case it's not just a misreading, the characters look identical! 
 When is an 'E' not an 'E'?  When it is an Epsilon or Ie.  Saying what
 characters will or will not be used as identifiers, when those
 characters are keys on a keyboard of a specific type, is pretty
 presumptuous.

Why is that rude and disrespectful? I'm certainly respecting developers
who want to use their scripts for identifiers, or else I would not have
suggested that they could do so.

However, from the experience with my own language, and the three or so
foreign languages I know, I can tell you that people would normally
don't mix identifiers of different scripts.

 Sure, that example was made up, but there are words which have been
 stolen from various languages by english, and you are discounting the
 case of single-letter temporary variables.  Saying what will and won't
 happen over the course of using unicode identifiers is quite the
 prediction.

Sure, people can make mistakes. They get an error, and then will
need to find the cause of the problem. Sometimes, this will be easy,
and sometimes, it will not.

 Indeed, they are similar, but_ different_ in my font as well.  The trick
 is that the glyphs are not different in the case of certain greek or
 cyrillic letters.  They don't just /look/ similar they /are identical/.

This string: EΕ is the LATIN CAPITAL LETTER E, followed by the GREEK
CAPITAL LETTER EPSILON. In the font my email composer uses, the E is
slightly larger than the Epsilon - so there /is/ a visual difference.

But even if there isn't: if this was a frequent problem, the name
error could include an alternative representation (say, with Unicode
ordinals for non-ASCII characters) which would give an easy visual
clue.

I still doubt that this is a frequent problem, and I don't see any
better grounds for claiming that it is than for claiming that it
is not.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-26 Thread Martin v. Löwis
Greg Ewing wrote:
 Would it help if an identifier were required to be
 made up of letters from the same alphabet, e.g. all
 Latin or all Greek or all Cyrillic, but not a mixture.
 Then you'd get an immediate error if you accidentally
 slipped in a letter from the wrong alphabet.

Not in the literal sense: you certainly want to allow
latin digits in, say, a cyrillic identifier.See

http://www.unicode.org/reports/tr31/

for what the Unicode consortium recommends to do.
In addition to the strict specification, they envision
usage guidelines. This seems Pythonic: just because
you could potentially shoot yourself in the foot doesn't
mean it should be banned from the language.

IOW, whether it would help largely depends on whether
the problem is real in the first place. Just because
you *can* come up with look-alike identifiers doesn't
mean that people will use them, or that they will mistake
the scripts (except for deliberately doing so, of
course).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-26 Thread Walter Dörwald
Am 25.10.2005 um 23:40 schrieb Josiah Carlson:

 [...]
 Identically drawn glyphs are a problem, and pretending that they  
 aren't
 a problem, doesn't make it so.  Right now, all possible name glyphs  
 are
 visually distinct, which would not be the case if any unicode  
 character
 could be used as a name (except for numerals).  Speaking of which,  
 would
 we then be offering support for arabic/indic numeric literals, and/or
 support it in int()/float()?

It's already supported in int() and float()

  int(u\u136c\u2082)
42
  float(u\u0664\u09e8)
42.0

But not as literals:

# -*- coding: unicode-escape -*-

print \u136c\u2082

This gives (on the Mac):

   File encoding.py, line 3
 print ፬₂
   ^
SyntaxError: invalid syntax

 [...]

Bye,
Walter Dörwald

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-26 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 
A few years ago we had a discussion about this on python-dev
and agreed to stick with ASCII identifiers for Python. I still
think that's the right way to go.
 
 I don't think there ever was such an agreement.

You even argued against having non-ASCII identifiers:

http://mail.python.org/pipermail/python-list/2002-May/102936.html

and I agree with you on most of the points you make in that
posting:

* Unicode identifiers are going to introduce massive
code breakage - just think of all the tools people use
to manipulate Python code today; I'm quite sure that
most of it will fail in one way or another if you present
it Unicode literals such as in zähler += 1.

* People don't seem very interested in using Unicode
identifiers, e.g.

  http://mail.python.org/pipermail/i18n-sig/2001-February/000828.html

most of the few who did comment, said they'd rather have
ASCII identifiers, e.g.

  http://mail.python.org/pipermail/python-list/2002-May/104050.html


Do you really think that it will help with code readability
if programmers are allowed to use native scripts for their
identifiers ?

I think this goes beyond just visual aspects of being able
to distinguish graphemes:

If you are told to debug a program
written by say a Japanese programmer using Japanese identifiers
you are going to have a really hard time. Integrating such
code into other applications will be even harder, since you'd
be forced to use his Japanese class names in your application.
This doesn't only introduce problems with being able to enter
the Japanese identifiers, it will also cause your application
to suddenly contain identifiers in Japanese even though that's
not your native script.

I think source code encodings provide an ideal way to
have comments written in native scripts - and people
use that a lot. However, keeping the program code itself
in plain ASCII makes it far more readable and reusable
across locales. Something that's important in this
globalized world.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 26 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-26 Thread Martin v. Löwis
M.-A. Lemburg wrote:
 You even argued against having non-ASCII identifiers:
 
 http://mail.python.org/pipermail/python-list/2002-May/102936.html

I see :-) It seems I have changed my mind since then (which
apparently predates PEP 263).

One issue I apparently was worried about was the plan to use
native-encoding byte strings for the identifiers; this I didn't
like at all.

 * Unicode identifiers are going to introduce massive
 code breakage - just think of all the tools people use
 to manipulate Python code today; I'm quite sure that
 most of it will fail in one way or another if you present
 it Unicode literals such as in zähler += 1.

True. Today, I think I would be willing to accept the
code breakage: these tools had quite some time to update
themselves to PEP 263 (even though not all of them have
done so yet); also, usage of the feature would only spread
gradually. A failure to support the feature in the Python
proper would be treated as a bug by us; how tool providers
deal with the feature would be their choice.

 * People don't seem very interested in using Unicode
 identifiers, e.g.
 
   http://mail.python.org/pipermail/i18n-sig/2001-February/000828.html

True. However, I also suspect that lack of tool support
contributes to that. For the specific case of Java,
there is no notion of source encoding, which makes Unicode
identifiers really tedious to use.

If it were really easy to use, I assume people would actually
use it - atleast in some of the contexts, like teaching,
where Python is also widely used.

 Do you really think that it will help with code readability
 if programmers are allowed to use native scripts for their
 identifiers ?

Yes, I do - for some groups of users. Of course, code sharing
would be more difficult, and there certainly should be a policy
to use only ASCII in the standard library. But within local
groups, users would find understanding code easier if they
knew what the identifiers actually meant.

 If you are told to debug a program
 written by say a Japanese programmer using Japanese identifiers
 you are going to have a really hard time. Integrating such
 code into other applications will be even harder, since you'd
 be forced to use his Japanese class names in your application.

Certainly, yes. There is a trade-off: you can make it easier
for some people to read and write code if they can use their
native script; at the same time, it would be harder for others
to read and modify it.

It's a policy decision whether you use English identifiers or
not - it shouldn't be a technical decision (as it currently
is).

 I think source code encodings provide an ideal way to
 have comments written in native scripts - and people
 use that a lot. However, keeping the program code itself
 in plain ASCII makes it far more readable and reusable
 across locales. Something that's important in this
 globalized world.

Certainly. However, some programs don't need to live in
a globalized world - e.g. if they are homework in a school.
Within a locale, using native scripts would make the program
more readable.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-26 Thread Josiah Carlson

Martin v. Löwis [EMAIL PROTECTED] wrote:
 
 M.-A. Lemburg wrote:
  You even argued against having non-ASCII identifiers:
  
  http://mail.python.org/pipermail/python-list/2002-May/102936.html
  
  Do you really think that it will help with code readability
  if programmers are allowed to use native scripts for their
  identifiers ?
 
 Yes, I do - for some groups of users. Of course, code sharing
 would be more difficult, and there certainly should be a policy
 to use only ASCII in the standard library. But within local
 groups, users would find understanding code easier if they
 knew what the identifiers actually meant.

According to wikipedia (http://en.wikipedia.org/wiki/Latin_alphabet),
various languages have adopted a transliteration of their language
and/or former alphabets into latin.  They don't purport to know all of
the reasons why, and I'm not going to speculate.

Whether or not more languages start using the latin alphabet is a good
question.  Basing judgement on history and likely globalization, it is
only a matter of time before basically all languages have a
transcription into the latin alphabet that is taught to all (unless
China takes over the world).

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-26 Thread Josiah Carlson

Martin v. Löwis [EMAIL PROTECTED] wrote:
 
 Josiah Carlson wrote:
  In this case it's not just a misreading, the characters look identical! 
  When is an 'E' not an 'E'?  When it is an Epsilon or Ie.  Saying what
  characters will or will not be used as identifiers, when those
  characters are keys on a keyboard of a specific type, is pretty
  presumptuous.
 
 Why is that rude and disrespectful? I'm certainly respecting developers
 who want to use their scripts for identifiers, or else I would not have
 suggested that they could do so.

I never said rude, I said presumptuous.  Going beyond what is right or
proper; excessively forward. (according to dictionary.com, the OED has
a similar definition).  I was trying to say that in stating that users
wouldn't be using keys on their keyboard in their natual language when
also using english characters, that you were assuming a bit about their
usage patterns that you perhaps shouldn't.  I certainly could also be
presumptuous in stating that users may very well mix certain languages,
but it seems to be more likely given keywords and the standard library
using the latin alphabet.


  Indeed, they are similar, but_ different_ in my font as well.  The trick
  is that the glyphs are not different in the case of certain greek or
  cyrillic letters.  They don't just /look/ similar they /are identical/.
 
 This string: EΕ is the LATIN CAPITAL LETTER E, followed by the GREEK
 CAPITAL LETTER EPSILON. In the font my email composer uses, the E is
 slightly larger than the Epsilon - so there /is/ a visual difference.

My email client doesn't handle unicode, but a quick check by swapping
fonts in a word processor provides that at least on my platform, all
three are the same glyph (same size, shape, ...) for all fixed-width
fonts. If a platform distinguishes all three, then one should consider
one's platform lucky.  Not all platforms and/or preferred fonts of users
are.

 But even if there isn't: if this was a frequent problem, the name
 error could include an alternative representation (say, with Unicode
 ordinals for non-ASCII characters) which would give an easy visual
 clue.

It would offer a great cue, but I'm not sure if it is possible.  I think
that it sounds like an ugly discussion of stdout/err encodings and
exception handling machinery that I don't want to be a part of.

 I still doubt that this is a frequent problem, and I don't see any
 better grounds for claiming that it is than for claiming that it
 is not.

Whether or not it is frequent will depend on the prevalence of desire to
use those characters.  While I don't think that such uses will be as
common as using 'klass' when passing a class, I do think that it will
result in more than a few sf bug reports.  I also share Marc-Andre
Lemburg's concerns about the understandability of code written in Kanji,
Hebrew, Arabic, etc., at least for those who have not memorized the
entirety of those alphabets.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-26 Thread Martin v. Löwis
Josiah Carlson wrote:
 According to wikipedia (http://en.wikipedia.org/wiki/Latin_alphabet),
 various languages have adopted a transliteration of their language
 and/or former alphabets into latin.  They don't purport to know all of
 the reasons why, and I'm not going to speculate.
 
 Whether or not more languages start using the latin alphabet is a good
 question.  Basing judgement on history and likely globalization, it is
 only a matter of time before basically all languages have a
 transcription into the latin alphabet that is taught to all (unless
 China takes over the world).

That is a very U.S. centric view. I don't share it, but I think it is
pointless to argue against it.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-26 Thread Greg Ewing
M.-A. Lemburg wrote:

 If you are told to debug a program
 written by say a Japanese programmer using Japanese identifiers
 you are going to have a really hard time.

Or you could look upon it as an opportunity to
broaden your mental horizons by learning some
Japanese. :-)

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | A citizen of NewZealandCorp, a   |
Christchurch, New Zealand  | wholly-owned subsidiary of USA Inc.  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Fredrik Lundh
M.-A. Lemburg wrote:

 I don't follow you here. The source code encoding
 is only applied to Unicode literals (you are using string
 literals in your example). String literals are passed
 through as-is.

however, for Python 3000, it would be nice if the source-code encoding applied
to the *entire* file (XML-style), rather than just unicode string literals and 
(hope-
fully) comments and docstrings.

/F 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread M.-A. Lemburg
Fredrik Lundh wrote:
 M.-A. Lemburg wrote:
 
 
I don't follow you here. The source code encoding
is only applied to Unicode literals (you are using string
literals in your example). String literals are passed
through as-is.
 
 
 however, for Python 3000, it would be nice if the source-code encoding applied
 to the *entire* file (XML-style), rather than just unicode string literals 
 and (hope-
 fully) comments and docstrings.

Actually, the encoding is applied to the complete source file:
the file is transcoded into UTF-8 and then parsed by the
Python parser.

Unicode literals are then decoded from the UTF-8 into Unicode.
String literals are transcoded back into the source code encoding,
thus making the (rather long due to technical constraints) round-trip
source code encoding - Unicode - UTF-8 - Unicode - source code encoding.

Python 3k should have a fully Unicode based parser to reduce this
additional transcoding overhead.

Since Py3k will only have Unicode literals, the problems with
string literals will go away all by themselves :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Fredrik Lundh wrote:
 however, for Python 3000, it would be nice if the source-code encoding applied
 to the *entire* file (XML-style), rather than just unicode string literals 
 and (hope-
 fully) comments and docstrings.

As MAL explains, the encoding currently does apply to the entire file.
However, because of the Python syntax, you are restricted to ASCII
in many places, such as keywords, number literals, and (unfortunately)
identifiers. Lifting the restriction on identifiers is on my agenda.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

Martin v. Löwis [EMAIL PROTECTED] wrote:
 
 Fredrik Lundh wrote:
  however, for Python 3000, it would be nice if the source-code encoding 
  applied
  to the *entire* file (XML-style), rather than just unicode string literals 
  and (hope-
  fully) comments and docstrings.
 
 As MAL explains, the encoding currently does apply to the entire file.
 However, because of the Python syntax, you are restricted to ASCII
 in many places, such as keywords, number literals, and (unfortunately)
 identifiers. Lifting the restriction on identifiers is on my agenda.

It seems that removing this restriction may cause serious issues, at
least in the case when using cyrillic characters in names.  See recent
security issues in regards to web addresses in web browsers for the
confusion (and/or name errors) that could result in their use.

While I agree in principle that people should be able to use the
entirety of one's own natural language in writing software in
programming languages, I think that it is an ugly can of worms that
perhaps shouldn't be opened.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Josiah Carlson wrote:
 It seems that removing this restriction may cause serious issues, at
 least in the case when using cyrillic characters in names.  See recent
 security issues in regards to web addresses in web browsers for the
 confusion (and/or name errors) that could result in their use.

That impression is deceiving. We are talking about source code here;
people type in identifiers explicitly rather than receiving them
through linking, and they scope identifiers (by module or object).

If somebody manages to get look-alike identifiers into your Python
libraries, you have bigger problems than these look-alikes: anybody
capable of doing so could just as well replace the real thing in
the first place.

As always in computer security: define your threat model before
reasoning about the risks.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

Martin v. Löwis [EMAIL PROTECTED] wrote:
 
 Josiah Carlson wrote:
  It seems that removing this restriction may cause serious issues, at
  least in the case when using cyrillic characters in names.  See recent
  security issues in regards to web addresses in web browsers for the
  confusion (and/or name errors) that could result in their use.
 
 That impression is deceiving. We are talking about source code here;
 people type in identifiers explicitly rather than receiving them
 through linking, and they scope identifiers (by module or object).
 
 If somebody manages to get look-alike identifiers into your Python
 libraries, you have bigger problems than these look-alikes: anybody
 capable of doing so could just as well replace the real thing in
 the first place.
 
 As always in computer security: define your threat model before
 reasoning about the risks.

I should have been more explicit.  I did not mean to imply that I was
concerned about the security implications of inserting arbitrary
identifiers in Python (I was mentioning the web browser case for
an example of how such characters have been confusing previously), I am
concerned about confusion involved with using:
Greek Capital: Alpha, Beta, Epsilon, Zeta, Eta, Iota, Kappa, Mu, Nu,
Omicron, Rho, and Tau.
Cyrillic Capital: Dze, Je, A, Ve, Ie, Em, En, O, Er, Es, Te, Ha, ...

And how users could say, name error? But I typed in window.draw(PEN) as
I was told to, and it didn't work!


Identically drawn glyphs are a problem, and pretending that they aren't
a problem, doesn't make it so.  Right now, all possible name glyphs are
visually distinct, which would not be the case if any unicode character
could be used as a name (except for numerals).  Speaking of which, would
we then be offering support for arabic/indic numeric literals, and/or
support it in int()/float()?  Ideally I would like to say yes, but I
could see the confusion if such were allowed.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread M.-A. Lemburg
Josiah Carlson wrote:
 Martin v. Löwis [EMAIL PROTECTED] wrote:
 
Fredrik Lundh wrote:

however, for Python 3000, it would be nice if the source-code encoding 
applied
to the *entire* file (XML-style), rather than just unicode string literals 
and (hope-
fully) comments and docstrings.

As MAL explains, the encoding currently does apply to the entire file.
However, because of the Python syntax, you are restricted to ASCII
in many places, such as keywords, number literals, and (unfortunately)
identifiers. Lifting the restriction on identifiers is on my agenda.
 
 
 It seems that removing this restriction may cause serious issues, at
 least in the case when using cyrillic characters in names.  See recent
 security issues in regards to web addresses in web browsers for the
 confusion (and/or name errors) that could result in their use.
 
 While I agree in principle that people should be able to use the
 entirety of one's own natural language in writing software in
 programming languages, I think that it is an ugly can of worms that
 perhaps shouldn't be opened.

I agree with Josiah.

A few years ago we had a discussion about this on python-dev
and agreed to stick with ASCII identifiers for Python. I still
think that's the right way to go.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Guido van Rossum
On 10/25/05, Josiah Carlson [EMAIL PROTECTED] wrote:
 Identically drawn glyphs are a problem, and pretending that they aren't
 a problem, doesn't make it so.  Right now, all possible name glyphs are
 visually distinct, which would not be the case if any unicode character
 could be used as a name (except for numerals).  Speaking of which, would
 we then be offering support for arabic/indic numeric literals, and/or
 support it in int()/float()?  Ideally I would like to say yes, but I
 could see the confusion if such were allowed.

This problem isn't new. There are plenty of fonts where 1 and l are
hard to distinguish, or l and I for that matter, or O and 0.

Yes, we need better tools to diagnose this.

No, we shouldn't let this stop us from adding such a feature if it is
otherwise a good feature.

I'm not so sure about this for other reasons -- it hampers code
sharing, and as soon as you add right-to-left character sets to the
mix (or top-to-bottom, for that matter), displaying source code is
going to be near impossible for most tools (since the keywords and
standard library module names will still be in the Latin alphabet).
This actually seems a killer even for allowing Unicode in comments,
which I'd otherwise favor. What do Unicode-aware apps generally do
with right-to-left characters?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Josiah Carlson wrote:
 And how users could say, name error? But I typed in window.draw(PEN) as
 I was told to, and it didn't work!

Ah, so the serious issues you are talking about are not security 
issues, but usability issues.

I don't think extending the range of acceptable characters will
cause any additional confusion. Users are already getting surprising
NameErrors/AttributeErrors in the following cases:
- they just misspell the identifier, and then, when the error message
   is printed, fail to recognize the difference, as they read over the
   typo just like they read over it when mistyping it in the first place.

- they run into confusions with different things having the same names
   in different contexts. For example, they wonder why they get TypeError
   for passing the wrong number of arguments to a function, when the
   call matches exactly what the source code in front of them tells
   them - only that they were calling a different function which just
   happened to have the same name.

In the light of these common mistakes, your example with an identifier
named PEN, where the P might be a cyrillic letter or the E a greek one
is just made up: For window.draw, people will readily understand that
they are supposed to use Latin letters. More generally, they will know
what script to use just from looking at the identifier.

 Identically drawn glyphs are a problem, and pretending that they aren't
 a problem, doesn't make it so.  Right now, all possible name glyphs are
 visually distinct

Not at all: Just compare Fool and Foo1 (and perhaps FooI)


In the font in which I'm typing this, these are slightly different - but
there are fonts in which the difference is really difficult to
recognize.

 Speaking of which, would
 we then be offering support for arabic/indic numeric literals, and/or
 support it in int()/float()?

No. None of the Arabic users have ever requested such a feature, so
it would be stupid to provide it. We provide extended identifiers not
for the fun of it, but because users are requesting them.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
M.-A. Lemburg wrote:
 A few years ago we had a discussion about this on python-dev
 and agreed to stick with ASCII identifiers for Python. I still
 think that's the right way to go.

I don't think there ever was such an agreement.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

Martin v. Löwis [EMAIL PROTECTED] wrote:
 
 Josiah Carlson wrote:
  And how users could say, name error? But I typed in window.draw(PEN) as
  I was told to, and it didn't work!
 
 Ah, so the serious issues you are talking about are not security 
 issues, but usability issues.

Indeed, it was a misunderstanding, as the email stated:
I did not mean to imply that I was concerned about the security
implications of inserting arbitrary identifiers in Python (I was
mentioning the web browser case for an example of how such
characters have been confusing previously), I am concerned about
confusion involved with using: [glyphs which are identical]


 I don't think extending the range of acceptable characters will
 cause any additional confusion. Users are already getting surprising
 NameErrors/AttributeErrors in the following cases:
 - they just misspell the identifier, and then, when the error message
is printed, fail to recognize the difference, as they read over the
typo just like they read over it when mistyping it in the first place.

In this case it's not just a misreading, the characters look identical! 
When is an 'E' not an 'E'?  When it is an Epsilon or Ie.  Saying what
characters will or will not be used as identifiers, when those
characters are keys on a keyboard of a specific type, is pretty
presumptuous.


 - they run into confusions with different things having the same names
in different contexts. For example, they wonder why they get TypeError
for passing the wrong number of arguments to a function, when the
call matches exactly what the source code in front of them tells
them - only that they were calling a different function which just
happened to have the same name.

Right, and users should be reading the documentation for the functions
and methods they are calling.


 In the light of these common mistakes, your example with an identifier
 named PEN, where the P might be a cyrillic letter or the E a greek one
 is just made up: For window.draw, people will readily understand that
 they are supposed to use Latin letters. More generally, they will know
 what script to use just from looking at the identifier.

Sure, that example was made up, but there are words which have been
stolen from various languages by english, and you are discounting the
case of single-letter temporary variables.  Saying what will and won't
happen over the course of using unicode identifiers is quite the
prediction.


  Identically drawn glyphs are a problem, and pretending that they aren't
  a problem, doesn't make it so.  Right now, all possible name glyphs are
  visually distinct
 
 Not at all: Just compare Fool and Foo1 (and perhaps FooI)
 
 In the font in which I'm typing this, these are slightly different - but
 there are fonts in which the difference is really difficult to
 recognize.

Indeed, they are similar, but_ different_ in my font as well.  The trick
is that the glyphs are not different in the case of certain greek or
cyrillic letters.  They don't just /look/ similar they /are identical/.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Guido van Rossum
On 10/25/05, Josiah Carlson [EMAIL PROTECTED] wrote:
 Indeed, they are similar, but_ different_ in my font as well.  The trick
 is that the glyphs are not different in the case of certain greek or
 cyrillic letters.  They don't just /look/ similar they /are identical/.

Well, in the font I'm using to read this email, I and l are /identical/.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

Guido van Rossum [EMAIL PROTECTED] wrote:
 
 On 10/25/05, Josiah Carlson [EMAIL PROTECTED] wrote:
  Indeed, they are similar, but_ different_ in my font as well.  The trick
  is that the glyphs are not different in the case of certain greek or
  cyrillic letters.  They don't just /look/ similar they /are identical/.
 
 Well, in the font I'm using to read this email, I and l are /identical/.

In all fonts I've seen, E/Epsilon/Ie are /always identical/.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Neil Hodgson
Martin v. Löwis:

 This aspect of rendering is often not implemented, though. Web browsers
 do it correctly, see
 ...
 GUI frameworks sometimes do it correctly, sometimes don't; most
 notably, Tk has no good support for RTL text.

   Scintilla does a rough job with this. RTL text is displayed
correctly as the underlying platform libraries (Windows or GTK+/Pango)
handle this aspect when called to draw text. However editing is not
performed correctly with the caret not being placed correctly within
RTL text and other visual glitches. There is interest in the area and
even a funding proposal this week.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Greg Ewing
Martin v. Löwis wrote:

 For window.draw, people will readily understand that
 they are supposed to use Latin letters. More generally, they will know
 what script to use just from looking at the identifier.

Would it help if an identifier were required to be
made up of letters from the same alphabet, e.g. all
Latin or all Greek or all Cyrillic, but not a mixture.
Then you'd get an immediate error if you accidentally
slipped in a letter from the wrong alphabet.

Greg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com