Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)

2005-05-14 Thread Shane Hathaway
M.-A. Lemburg wrote:
 It is important to be able to rely on a default that
 is used when no special options are given. The decision
 to use UCS2 or UCS4 is much too important to be
 left to a configure script.

Should the choice be a runtime decision?  I think it should be.  That
could mean two unicode types, a call similar to
sys.setdefaultencoding(), a new unicode extension module, or something else.

BTW, thanks for discussing these issues.  I tried to write a patch to
the unicode API documentation, but it's hard to know just what to write.
 I think I can say this: sometimes your strings are UTF-16, so you're
working with code units that are not necessarily complete code points;
sometimes your strings are UCS4, so you're working with code units that
are also complete code points.  The choice between UTF-16 and UCS4 is
made at the time the Python interpreter is compiled and the default
choice varies by operating system and configuration.

Shane
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)

2005-05-14 Thread Bob Ippolito

On May 14, 2005, at 3:05 PM, Shane Hathaway wrote:

 M.-A. Lemburg wrote:

 It is important to be able to rely on a default that
 is used when no special options are given. The decision
 to use UCS2 or UCS4 is much too important to be
 left to a configure script.


 Should the choice be a runtime decision?  I think it should be.  That
 could mean two unicode types, a call similar to
 sys.setdefaultencoding(), a new unicode extension module, or  
 something else.

 BTW, thanks for discussing these issues.  I tried to write a patch to
 the unicode API documentation, but it's hard to know just what to  
 write.
  I think I can say this: sometimes your strings are UTF-16, so you're
 working with code units that are not necessarily complete code points;
 sometimes your strings are UCS4, so you're working with code units  
 that
 are also complete code points.  The choice between UTF-16 and UCS4 is
 made at the time the Python interpreter is compiled and the default
 choice varies by operating system and configuration.

Well, if you're going to make it runtime, you might as well do it  
right.  Take away the restriction that the unicode type backing store  
is forced to be a particular encoding (i.e. get rid of  
PyUnicode_AS_UNICODE) and give it more flexibility.

The implementation of NSString in OpenDarwin's libFoundation http:// 
libfoundation.opendarwin.org/ (BSD license), or the CFString  
implementation in Apple's CoreFoundation http://developer.apple.com/ 
darwin/cflite.html (APSL) would be an excellent place to look for  
how this can be done.

Of course, for backwards compatibility reasons, this would have to be  
a new type that descends from basestring.  text would probably be a  
good name for it.  This would be an abstract implementation, where  
you can make concrete subclasses that actually implement the various  
operations as necessary.  For example, you could have text_ucs2,  
text_ucs4, text_ascii, text_codec, etc.

The bonus here is you can get people to shut up about space efficient  
representations, because you can use whatever makes sense.

-bob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)

2005-05-13 Thread Martin v. Löwis
M.-A. Lemburg wrote:
 I'm not breaking anything, I'm just correcting the
 way things have to be configured in an effort to
 bring back the cross-platforma configure default.

Your proposed change will break the build of Python
on Redhat/Fedora systems.

 I'm talking about the *configure* default, not the
 default installation you find on any particular
 platform (this remains a platform decision to be made
 by the packagers).

Why is it good to have such a default? Why is that
so good that its better than having Tkinter work
by default?

 The main point is that we can no longer tell users:
 if you run configure without any further options,
 you will get a UCS2 build of Python.

It's not a matter of telling the users no longer.
We currently don't tell that in any documentation;
if you had been telling that users, you were wrong.

./configure --help says that the default for
--enable-unicode is yes.

 I want to restore this fact which was true before
 Jeff's patch was applied.

I understand that you want that. I'm opposed.

 Telling users to look at the configure script printout
 to determine whether they have just built a UCS2
 or UCS4 is just not right given its implications.

Right. We should tell them what the procedure is that
is used.

 It will continue to work - the only change, if any,
 is to add --enable-unicode=tcl or --enable-unicode=ucs4
 (if you know that TCL uses UCS4) to your configure
 setup. The --enable-unicode=ucs4 configure setting
 is part of RedHat and SuSE already, so there won't
 be any changes necessary.

Yes, but users of these systems need to adjust.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)

2005-05-13 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 
I'm not breaking anything, I'm just correcting the
way things have to be configured in an effort to
bring back the cross-platforma configure default.
 
 Your proposed change will break the build of Python
 on Redhat/Fedora systems.

You know that this is not true. Python will happily
continue to compile on these systems.

I'm talking about the *configure* default, not the
default installation you find on any particular
platform (this remains a platform decision to be made
by the packagers).
 
 
 Why is it good to have such a default? Why is that
 so good that its better than having Tkinter work
 by default?

It is important to be able to rely on a default that
is used when no special options are given. The decision
to use UCS2 or UCS4 is much too important to be
left to a configure script.

The main point is that we can no longer tell users:
if you run configure without any further options,
you will get a UCS2 build of Python.
 
 
 It's not a matter of telling the users no longer.
 We currently don't tell that in any documentation;
 if you had been telling that users, you were wrong.

 ./configure --help says that the default for
 --enable-unicode is yes.

Let's see:
http://www.python.org/peps/pep-0100.html
http://www.python.org/peps/pep-0261.html
http://www.python.org/doc/2.2.3/whatsnew/node8.html

Apart from the mention in the What's New document for
Python 2.2 and a FAQ entry, the documentation doesn't
mention UCS4 at all.

However, you're right: the configure script should print
(default if ucs2).

I want to restore this fact which was true before
Jeff's patch was applied.
 
 
 I understand that you want that. I'm opposed.

Noted.

Telling users to look at the configure script printout
to determine whether they have just built a UCS2
or UCS4 is just not right given its implications.
 
 Right. We should tell them what the procedure is that
 is used.

No, we should make it an explicit decision by the
user running the configure script.

BTW, a UCS4 TCL is just as non-standard as a UCS4
Python build. Non-standard build options should never be
selected by a configure script all by itself.

It will continue to work - the only change, if any,
is to add --enable-unicode=tcl or --enable-unicode=ucs4
(if you know that TCL uses UCS4) to your configure
setup. The --enable-unicode=ucs4 configure setting
is part of RedHat and SuSE already, so there won't
be any changes necessary.
 
 Yes, but users of these systems need to adjust.

Not really: they won't even notice the change in the
configure script if they use the system provided Python
versions. Or am I missing something ?


Regardless of all this discussion, I think we should
try to get _tkinter.c to work with a UCS4 TCL version
as well. The conversion from UCS4 (Python) to UCS2 (TCL)
is already integrated, so adding support for the other way
around should be  rather straight forward.

Any takers ?

Regards,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 13 2005)
  Python/Zope Consulting and Support ...http://www.egenix.com/
  mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
  mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)

2005-05-10 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 
I think we should remove the defaulting to whatever
TCL uses and instead warn the user about a possible
problem in case TCL is found and uses a Unicode
width which is incompatible with Python's choice.
 
 -1.

Martin, please reconsider... the choice is between:

a) We have a cross-platform default Unicode width
   setting of UCS2.

b) The default Unicode width is undefined and the only
   thing we can tell the user is:

   Run the configure script and then try the interpreter
   to check whether you've got a UCS2 or UCS4 build.

Option b) is what the current build system implements
and causes problems since the binary interface of the
interpreter changes depending on the width of Py_UNICODE
making UCS2 and UCS4 builds incompatible.

I want to change the --enable-unicode switch back to
always use UCS2 as default and add a new option value
tcl which then triggers the behavior you've added to
support _tkinter, ie.

--enable-unicode=tcl

bases the decision to use UCS2 or UCS4 on the installed
TCL interpreter (if there is one).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 10 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)

2005-05-10 Thread Martin v. Löwis
M.-A. Lemburg wrote:
 Martin, please reconsider... the choice is between:

The point is that this all was discussed, and decided the
other way 'round. There is no point in going back and forth
between the two choices:

http://mail.python.org/pipermail/python-dev/2003-June/036461.html

If we remove the code, people will *again* report that
_tkinter stops building on Redhat (see #719880). I
see no value in breaking what works now.

 a) We have a cross-platform default Unicode width
setting of UCS2.

It is hardly the default anymore cross-platform. Many
installations on Linux are built as UCS-4 now - no
matter what configure does.

 b) The default Unicode width is undefined and the only
thing we can tell the user is:
 
Run the configure script and then try the interpreter
to check whether you've got a UCS2 or UCS4 build.

It's not at all undefined. There is a precise, deterministic,
repeatable algorithm that determines the default, and
if people want to know, we can tell them.

 I want to change the --enable-unicode switch back to
 always use UCS2 as default and add a new option value
 tcl which then triggers the behavior you've added to
 support _tkinter, ie.
 
 --enable-unicode=tcl
 
 bases the decision to use UCS2 or UCS4 on the installed
 TCL interpreter (if there is one).

Please don't - unless you also go back and re-open the
bug reports, change the documentation, tell the Linux
packagers that settings have changed, and so on.

Why deliberately break what currently works?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com