Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)
M.-A. Lemburg wrote: It is important to be able to rely on a default that is used when no special options are given. The decision to use UCS2 or UCS4 is much too important to be left to a configure script. Should the choice be a runtime decision? I think it should be. That could mean two unicode types, a call similar to sys.setdefaultencoding(), a new unicode extension module, or something else. BTW, thanks for discussing these issues. I tried to write a patch to the unicode API documentation, but it's hard to know just what to write. I think I can say this: sometimes your strings are UTF-16, so you're working with code units that are not necessarily complete code points; sometimes your strings are UCS4, so you're working with code units that are also complete code points. The choice between UTF-16 and UCS4 is made at the time the Python interpreter is compiled and the default choice varies by operating system and configuration. Shane ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)
On May 14, 2005, at 3:05 PM, Shane Hathaway wrote: M.-A. Lemburg wrote: It is important to be able to rely on a default that is used when no special options are given. The decision to use UCS2 or UCS4 is much too important to be left to a configure script. Should the choice be a runtime decision? I think it should be. That could mean two unicode types, a call similar to sys.setdefaultencoding(), a new unicode extension module, or something else. BTW, thanks for discussing these issues. I tried to write a patch to the unicode API documentation, but it's hard to know just what to write. I think I can say this: sometimes your strings are UTF-16, so you're working with code units that are not necessarily complete code points; sometimes your strings are UCS4, so you're working with code units that are also complete code points. The choice between UTF-16 and UCS4 is made at the time the Python interpreter is compiled and the default choice varies by operating system and configuration. Well, if you're going to make it runtime, you might as well do it right. Take away the restriction that the unicode type backing store is forced to be a particular encoding (i.e. get rid of PyUnicode_AS_UNICODE) and give it more flexibility. The implementation of NSString in OpenDarwin's libFoundation http:// libfoundation.opendarwin.org/ (BSD license), or the CFString implementation in Apple's CoreFoundation http://developer.apple.com/ darwin/cflite.html (APSL) would be an excellent place to look for how this can be done. Of course, for backwards compatibility reasons, this would have to be a new type that descends from basestring. text would probably be a good name for it. This would be an abstract implementation, where you can make concrete subclasses that actually implement the various operations as necessary. For example, you could have text_ucs2, text_ucs4, text_ascii, text_codec, etc. The bonus here is you can get people to shut up about space efficient representations, because you can use whatever makes sense. -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)
M.-A. Lemburg wrote: I'm not breaking anything, I'm just correcting the way things have to be configured in an effort to bring back the cross-platforma configure default. Your proposed change will break the build of Python on Redhat/Fedora systems. I'm talking about the *configure* default, not the default installation you find on any particular platform (this remains a platform decision to be made by the packagers). Why is it good to have such a default? Why is that so good that its better than having Tkinter work by default? The main point is that we can no longer tell users: if you run configure without any further options, you will get a UCS2 build of Python. It's not a matter of telling the users no longer. We currently don't tell that in any documentation; if you had been telling that users, you were wrong. ./configure --help says that the default for --enable-unicode is yes. I want to restore this fact which was true before Jeff's patch was applied. I understand that you want that. I'm opposed. Telling users to look at the configure script printout to determine whether they have just built a UCS2 or UCS4 is just not right given its implications. Right. We should tell them what the procedure is that is used. It will continue to work - the only change, if any, is to add --enable-unicode=tcl or --enable-unicode=ucs4 (if you know that TCL uses UCS4) to your configure setup. The --enable-unicode=ucs4 configure setting is part of RedHat and SuSE already, so there won't be any changes necessary. Yes, but users of these systems need to adjust. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)
Martin v. Löwis wrote: M.-A. Lemburg wrote: I'm not breaking anything, I'm just correcting the way things have to be configured in an effort to bring back the cross-platforma configure default. Your proposed change will break the build of Python on Redhat/Fedora systems. You know that this is not true. Python will happily continue to compile on these systems. I'm talking about the *configure* default, not the default installation you find on any particular platform (this remains a platform decision to be made by the packagers). Why is it good to have such a default? Why is that so good that its better than having Tkinter work by default? It is important to be able to rely on a default that is used when no special options are given. The decision to use UCS2 or UCS4 is much too important to be left to a configure script. The main point is that we can no longer tell users: if you run configure without any further options, you will get a UCS2 build of Python. It's not a matter of telling the users no longer. We currently don't tell that in any documentation; if you had been telling that users, you were wrong. ./configure --help says that the default for --enable-unicode is yes. Let's see: http://www.python.org/peps/pep-0100.html http://www.python.org/peps/pep-0261.html http://www.python.org/doc/2.2.3/whatsnew/node8.html Apart from the mention in the What's New document for Python 2.2 and a FAQ entry, the documentation doesn't mention UCS4 at all. However, you're right: the configure script should print (default if ucs2). I want to restore this fact which was true before Jeff's patch was applied. I understand that you want that. I'm opposed. Noted. Telling users to look at the configure script printout to determine whether they have just built a UCS2 or UCS4 is just not right given its implications. Right. We should tell them what the procedure is that is used. No, we should make it an explicit decision by the user running the configure script. BTW, a UCS4 TCL is just as non-standard as a UCS4 Python build. Non-standard build options should never be selected by a configure script all by itself. It will continue to work - the only change, if any, is to add --enable-unicode=tcl or --enable-unicode=ucs4 (if you know that TCL uses UCS4) to your configure setup. The --enable-unicode=ucs4 configure setting is part of RedHat and SuSE already, so there won't be any changes necessary. Yes, but users of these systems need to adjust. Not really: they won't even notice the change in the configure script if they use the system provided Python versions. Or am I missing something ? Regardless of all this discussion, I think we should try to get _tkinter.c to work with a UCS4 TCL version as well. The conversion from UCS4 (Python) to UCS2 (TCL) is already integrated, so adding support for the other way around should be rather straight forward. Any takers ? Regards, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 13 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)
Martin v. Löwis wrote: M.-A. Lemburg wrote: I think we should remove the defaulting to whatever TCL uses and instead warn the user about a possible problem in case TCL is found and uses a Unicode width which is incompatible with Python's choice. -1. Martin, please reconsider... the choice is between: a) We have a cross-platform default Unicode width setting of UCS2. b) The default Unicode width is undefined and the only thing we can tell the user is: Run the configure script and then try the interpreter to check whether you've got a UCS2 or UCS4 build. Option b) is what the current build system implements and causes problems since the binary interface of the interpreter changes depending on the width of Py_UNICODE making UCS2 and UCS4 builds incompatible. I want to change the --enable-unicode switch back to always use UCS2 as default and add a new option value tcl which then triggers the behavior you've added to support _tkinter, ie. --enable-unicode=tcl bases the decision to use UCS2 or UCS4 on the installed TCL interpreter (if there is one). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 10 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)
M.-A. Lemburg wrote: Martin, please reconsider... the choice is between: The point is that this all was discussed, and decided the other way 'round. There is no point in going back and forth between the two choices: http://mail.python.org/pipermail/python-dev/2003-June/036461.html If we remove the code, people will *again* report that _tkinter stops building on Redhat (see #719880). I see no value in breaking what works now. a) We have a cross-platform default Unicode width setting of UCS2. It is hardly the default anymore cross-platform. Many installations on Linux are built as UCS-4 now - no matter what configure does. b) The default Unicode width is undefined and the only thing we can tell the user is: Run the configure script and then try the interpreter to check whether you've got a UCS2 or UCS4 build. It's not at all undefined. There is a precise, deterministic, repeatable algorithm that determines the default, and if people want to know, we can tell them. I want to change the --enable-unicode switch back to always use UCS2 as default and add a new option value tcl which then triggers the behavior you've added to support _tkinter, ie. --enable-unicode=tcl bases the decision to use UCS2 or UCS4 on the installed TCL interpreter (if there is one). Please don't - unless you also go back and re-open the bug reports, change the documentation, tell the Linux packagers that settings have changed, and so on. Why deliberately break what currently works? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com