[Python-Dev] Removing --with-wctype-functions support

2004-12-03 Thread M.-A. Lemburg
I would like to remove the support for using libc wctype functions (e.g. towupper(), towlower(), etc.) from the code base. The reason is that compiling Python using this switch not only breaks the test suite, it also causes the functions .lower() and .upper() to become locale aware and creates

Re: [Python-Dev] Deprecated xmllib module

2004-12-13 Thread M.-A. Lemburg
Martin v. Lwis wrote: As for PEP 4: I don't know whether it needs to be listed there. It appears that the PEP is largely unmaintained (I, personally, do not really maintain it). So one option would be to just stop using PEP 4 for recording deprecations, since we now have the warnings module. If we

Re: [Python-Dev] csv module TODO list

2005-01-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: Andrew McNamara wrote: There's a bunch of jobs we (CSV module maintainers) have been putting off - attached is a list (in no particular order): * unicode support (this will probably uglify the code considerably). Can you please elaborate on that? What needs to be done, and

Re: [Python-Dev] csv module TODO list

2005-01-05 Thread M.-A. Lemburg
using pre-processor macros). Quite a large job. Suggestions gratefully received. M.-A. Lemburg wrote: Indeed. The trick is to convert to Unicode early and to use Unicode literals instead of string literals in the code. Yes, although it would be nice to also retain the 8-bit versions as well. You can

Re: [Python-Dev] csv module TODO list

2005-01-05 Thread M.-A. Lemburg
Andrew McNamara wrote: Yes, although it would be nice to also retain the 8-bit versions as well. You can do so by using latin-1 as default encoding. Works great ! Yep, although that means we wear the cost of decoding and encoding for all 8 bit input. Right, but it makes the code very clean and

Re: [Python-Dev] Getting rid of unbound methods: patch available

2005-01-17 Thread M.-A. Lemburg
Nick Coghlan wrote: Guido van Rossum wrote: What do people think? (My main motivation for this, as stated before, is that it adds complexity without much benefit.) I'm in favour, since it removes the an unbound method is almost like a bare function, only not quite as useful distinction. It would

Re: [Python-Dev] Getting rid of unbound methods: patch available

2005-01-17 Thread M.-A. Lemburg
Guido van Rossum wrote: Apart from the tests that were testing the behavior of im_class, I found only a single piece of code in the standard library that used im_class of an unbound method object (the clever test in the pyclbr test). Uses of im_self and im_func were more widespread. Given the

Re: [Python-Dev] Getting rid of unbound methods: patch available

2005-01-18 Thread M.-A. Lemburg
Guido van Rossum wrote: [Guido] Apart from the tests that were testing the behavior of im_class, I found only a single piece of code in the standard library that used im_class of an unbound method object (the clever test in the pyclbr test). Uses of im_self and im_func were more widespread. Given

Re: [Python-Dev] Getting rid of unbound methods: patch available

2005-01-19 Thread M.-A. Lemburg
Guido van Rossum wrote: [me] I'm not sure I understand how basemethod is supposed to work; I can't find docs for it using Google (only three hits for the query mxTools basemethod). How does it depend on im_class? [Marc-Andre] It uses im_class to find the class defining the (unbound) method: def

Re: [Python-Dev] __str__ vs. __unicode__

2005-01-19 Thread M.-A. Lemburg
Walter Dörwald wrote: M.-A. Lemburg wrote: So the question is whether conversion of a Unicode sub-type to a true Unicode object should honor __unicode__ or not. The same question can be asked for many other types, e.g. floats (and __float__), integers (and __int__), etc. class float2(float

Re: [Python-Dev] __str__ vs. __unicode__

2005-01-23 Thread M.-A. Lemburg
Walter Dörwald wrote: M.-A. Lemburg wrote: [...] __str__ and __unicode__ as well as the other hooks were specifically added for the type constructors to use. However, these were added at a time where sub-classing of types was not possible, so it's time now to reconsider whether

Re: [Python-Dev] test_codecs failing

2005-02-08 Thread M.-A. Lemburg
Walter Dörwald wrote: Raymond Hettinger wrote: The most recent test_codecs check-in (1.19) is failing on a MSCV6.0 compilation running on WinMe: -- Ran 35 tests in 1.430s FAILED (failures=1) Traceback (most recent call last):

Re: [Python-Dev] Prospective Peephole Transformation

2005-02-18 Thread M.-A. Lemburg
Raymond Hettinger wrote: Based on some ideas from Skip, I had tried transforming the likes of x in (1,2,3) into x in frozenset([1,2,3]). When applicable, it substantially simplified the generated code and converted the O(n) lookup into an O(1) step. There were substantial savings even if the set

Re: [Python-Dev] os.access and Unicode

2005-03-08 Thread M.-A. Lemburg
Brett C. wrote: Martin v. Löwis wrote: Apparently, os.access was forgotten when the file system encoding was introduced in Python 2.2, and then it was again forgotten in PEP 277. I've now fixed it in the trunk (posixmodule.c:2.334), and I wonder whether this is a backport candidate. People who try

Re: [Python-Dev] unicode inconsistency?

2005-03-09 Thread M.-A. Lemburg
Neil Schemenauer wrote: On Wed, Mar 09, 2005 at 11:10:59AM +0100, M.-A. Lemburg wrote: The patch implements the PyObjbect_Text() idea (an API that returns a basestring instance, ie. string or unicode) and then uses this in '%s' (the string version) to properly propogate to u'%s' (the unicode

Re: [Python-Dev] Decimal returning NotImplemented (or not)

2005-03-10 Thread M.-A. Lemburg
Nick Coghlan wrote: Guido van Rossum wrote: No, the reason is that if we did this with exceptions, it would be liable to mask errors; an exception does not necessarily originate immediately with the code you invoked, it could have been raised by something else that was invoked by that code. The

Re: [Python-Dev] os.access and Unicode

2005-03-11 Thread M.-A. Lemburg
Martin v. Löwis wrote: Skip Montanaro wrote: I say backport. If people were trying to call os.access with unicode filenames it would have been failing and they were either avoiding unicode filenames as a result or working around it some other way. I can't see how making os.access work with

Re: [Python-Dev] os.access and Unicode

2005-03-11 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: The question is whether it would encourage conditional work-arounds. -1. That only makes the code more complicated. You misunderstand. I'm not proposing that the work-around is added to Python. I'm saying that Python *users* might introduce such work

Re: [Python-Dev] Adding any() and all()

2005-03-11 Thread M.-A. Lemburg
Raymond Hettinger wrote: BTW I definitely expect having to defend removing map/filter/reduce/lambda with a PEP; that's much more controversial because it's *removing* something and hence by definition breaking code. +1 on the PEP -1 on removing those tools - breaks too much code. I suspect that

Re: [Python-Dev] Adding any() and all()

2005-03-11 Thread M.-A. Lemburg
Guido van Rossum wrote: Here's my take on the key issues brought up: Alternative names anytrue(), alltrue(): before I posted to my blog I played with these names (actually anyTrue(), allTrue(), anyFalse(), allFalse()). But I realized (1) any() and all() read much better in their natural context

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-01 Thread M.-A. Lemburg
Evan Jones wrote: I recently rediscovered this strange behaviour in Python's Unicode handling. I *think* it is a bug, but before I go and try to hack together a patch, I figure I should run it by the experts here on Python-Dev. If you understand Unicode, please let me know if there are

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: Stephen J. Turnbull wrote: So there is a standard for the UTF-8 signature, and I know of applications which produce it. While I agree with you that Python's codecs shouldn't produce it (by default), providing an option to strip is a good idea. I would personally

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-05 Thread M.-A. Lemburg
Stephen J. Turnbull wrote: MAL == M [EMAIL PROTECTED] writes: MAL The BOM (byte order mark) was a non-standard Microsoft MAL invention to detect Unicode text data as such (MS always uses MAL UTF-16-LE for Unicode text files). The Japanese memopado (Notepad) uses UTF-8

Re: [Python-Dev] Unicode byte order mark decoding

2005-04-07 Thread M.-A. Lemburg
Nicholas Bastin wrote: On Apr 7, 2005, at 5:07 AM, M.-A. Lemburg wrote: The current implementation of the utf-16 codecs makes for some irritating gymnastics to write the BOM into the file before reading it if it contains no BOM, which seems quite like a bug in the codec. The codec

Re: [Python-Dev] Security capabilities in Python

2005-04-18 Thread M.-A. Lemburg
Eyal Lotem wrote: I would like to experiment with security based on Python references as security capabilities. Unfortunatly, there are several problems that make Python references invalid as capabilities: * There is no way to create secure proxies because there are no private attributes. * Lots

[Python-Dev] Re: switch statement

2005-04-20 Thread M.-A. Lemburg
Fredrik Lundh wrote: PS. a side effect of the for-in pattern is that I'm beginning to feel that Python might need a nice switch statement based on dictionary lookups, so I can replace multiple callbacks with a single loop body, without writing too many if/elif clauses. PEP 275 anyone ?

Re: [Python-Dev] Re: switch statement

2005-04-25 Thread M.-A. Lemburg
Shannon -jj Behrens wrote: On 4/20/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: Fredrik Lundh wrote: PS. a side effect of the for-in pattern is that I'm beginning to feel that Python might need a nice switch statement based on dictionary lookups, so I can replace multiple callbacks with a single

Re: [Python-Dev] Py_UNICODE madness

2005-05-04 Thread M.-A. Lemburg
Nicholas Bastin wrote: The documentation for Py_UNICODE states the following: This type represents a 16-bit unsigned storage type which is used by Python internally as basis for holding Unicode ordinals. On platforms where wchar_t is available and also has 16-bits, Py_UNICODE is a

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread M.-A. Lemburg
Nicholas Bastin wrote: On May 4, 2005, at 6:20 PM, Shane Hathaway wrote: Nicholas Bastin wrote: This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. Extension module developers should make no assumptions about the size of this

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread M.-A. Lemburg
Fredrik Lundh wrote: Thomas Heller wrote: AFAIK, you can configure Python to use 16-bits or 32-bits Unicode chars, independend from the size of wchar_t. The HAVE_USABLE_WCHAR_T macro can be used by extension writers to determine if Py_UNICODE is the same as wchar_t. note that usable is

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread M.-A. Lemburg
Nicholas Bastin wrote: On May 4, 2005, at 6:03 PM, Martin v. Löwis wrote: Nicholas Bastin wrote: This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. Extension module developers should make no assumptions about the size of this

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Shane Hathaway wrote: Martin v. Löwis wrote: Shane Hathaway wrote: I agree that UCS4 is needed. There is a balancing act here; UTF-16 is widely used and takes less space, while UCS4 is easier to treat as an array of characters. Maybe we can have both: unicode objects start with an internal

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: Hmm, looking at the configure.in script, it seems you're right. I wonder why this weird dependency on TCL was added. If Python is configured for UCS-2, and Tcl for UCS-4, then Tkinter would not work out of the box. Hence the weird dependency. I

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Nicholas Bastin wrote: On May 7, 2005, at 9:29 AM, Martin v. Löwis wrote: With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start supporting the full Unicode ccs the same way it supports UCS-2. Individual surrogate values remain accessible, and supporting non-BMP characters is left to

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Nicholas Bastin wrote: On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote: However, I don't understand all the excitement about Py_UNICODE: if you don't like the way this Python typedef works, you are free to interface to Python using any of the supported encodings using PyUnicode_Encode

Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)

2005-05-10 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: I think we should remove the defaulting to whatever TCL uses and instead warn the user about a possible problem in case TCL is found and uses a Unicode width which is incompatible with Python's choice. -1. Martin, please reconsider... the choice

Re: [Python-Dev] Python's Unicode width default (New Py_UNICODE doc)

2005-05-13 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: I'm not breaking anything, I'm just correcting the way things have to be configured in an effort to bring back the cross-platforma configure default. Your proposed change will break the build of Python on Redhat/Fedora systems. You know

Re: [Python-Dev] 'With' context documentation draft (was Re: Terminology for PEP 343

2005-07-08 Thread M.-A. Lemburg
Nick Coghlan wrote: OK, here's some draft documentation using Phillip's context terminology. I think it works very well. With Statements and Context Management A frequent need in programming is to ensure a particular action is taken after a specific section of code has been executed

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-09 Thread M.-A. Lemburg
Neil Hodgson wrote: Thomas Heller: But adding u'\u5b66\u6821\u30c7\u30fc' to sys.path won't allow to import this file as module. Internally Python\import.c converts everything to strings. I started to refactor import.c to work with PyStringObjects instead of char buffers as a first step -

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-09 Thread M.-A. Lemburg
Neil Hodgson wrote: M.-A. Lemburg: I don't really buy this trick: what if you happen to have a home directory with Unicode characters in it ? Most people choose account names and thus home directory names that are compatible with their preferred locale settings: German users

Re: [Python-Dev] Triple-quoted strings and indentation

2005-07-11 Thread M.-A. Lemburg
Bob Ippolito wrote: A better proposal would probably be another string prefix that means dedent, but I'm still not sold. doc processing software is clearly going to have to know how to dedent anyway in order to support existing code. Agreed. It is easy enough for any doc-string

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-11 Thread M.-A. Lemburg
Neil Hodgson wrote: On unicode versions of Windows, for attributes like os.listdir, os.getcwd, sys.argv, and os.environ, which can usefully return unicode strings, there are 4 options I see: 1) Always return unicode. This is the option I'd be happiest to use, myself, but expect this

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-12 Thread M.-A. Lemburg
Hi Neil, 2) Return unicode when the text can not be represented in ASCII. This will cause a change of behaviour for existing code which deals with non-ASCII data. +1 on this one (s/ASCII/Python's default encoding). I assume you mean the result of sys.getdefaultencoding() here. Yes. The

Re: [Python-Dev] SF patch #1214889 - file.encoding support

2005-07-14 Thread M.-A. Lemburg
Reinhold Birkenfeld wrote: Hi, would anyone care to comment about this patch of mine -- https://sourceforge.net/tracker/?func=detailatid=305470aid=1214889group_id=5470 It makes file.encoding read-write and lets the write() and writelines() methods obey it. Done. Please see SF. PS:

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-14 Thread M.-A. Lemburg
Hi Neil, With the proposed modification, sys.argv[1] u'\u20ac.txt' is converted through cp1251 Actually, it is not: if you pass in a Unicode argument to one of the file I/O functions and the OS supports Unicode directly or at least provides the notion of a file system encoding, then the file

Re: [Python-Dev] 'With' context documentation draft (was Re: Terminology for PEP 343

2005-07-14 Thread M.-A. Lemburg
Nick Coghlan wrote: M.-A. Lemburg wrote: May I suggest that you use a different name than context for this ?! The term context is way to broad for the application scopes that you have in mind here (like e.g. managing a resource in a multi-threaded application). It's actually the broadness

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-15 Thread M.-A. Lemburg
Martin v. Löwis wrote: Guido van Rossum wrote: Ah, sigh. I didn't know that os.listdir() behaves differently when the argument is Unicode. Does os.listdir(.) really behave differently than os.listdir(u.)? Bah! I don't think that's a very good design (although I see where it comes from).

Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion

2005-07-29 Thread M.-A. Lemburg
Martin v. Löwis wrote: I'd like to see the Python source be stored in Subversion instead of CVS, +1 and on python.org instead of sf.net. To facilitate discussion, I have drafted a PEP describing the rationale for doing so, and the technical procedure to be performed. Not sure about the

Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion

2005-08-02 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: The PSF does have a reasonable budget, so why not use it to maintain the infrastructure needed for Python development and let a company do the administration of the needed servers and the importing of the CSV and tracker items

Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion

2005-08-03 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: True, but if we never ask, we'll never know :-) My question was: Would asking a professional hosting company be a reasonable approach ? It would be an option, yes, of course. It's not an approach that *I* would be willing to implement, though

Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion

2005-08-04 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: I haven't received any offers to make a qualified statement. I only know that I would oppose an approach to ask somebody but our volunteers to do it for free, and I also know that I don't want to spend my time researching commercial alternatives

Re: [Python-Dev] Generalised String Coercion

2005-08-07 Thread M.-A. Lemburg
Guido van Rossum wrote: My first response to the PEP, however, is that instead of a new built-in function, I'd rather relax the requirement that str() return an 8-bit string -- after all, int() is allowed to return a long, so why couldn't str() be allowed to return a Unicode string? The

Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion

2005-08-07 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: BTW, in one of your replies I read that you had a problem with how cvs2svn handles trunk, branches and tags. In reality, this is no problem at all, since Subversion is very good at handling moves within the repository: you can easily change

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread M.-A. Lemburg
Guido van Rossum wrote: [Guido] My first response to the PEP, however, is that instead of a new built-in function, I'd rather relax the requirement that str() return an 8-bit string -- after all, int() is allowed to return a long, so why couldn't str() be allowed to return a Unicode string?

Re: [Python-Dev] Generalised String Coercion

2005-08-08 Thread M.-A. Lemburg
Michael Hudson wrote: M.-A. Lemburg [EMAIL PROTECTED] writes: Set the external encoding for stdin, stdout, stderr: (also an example for adding encoding support to an existing file object): def set_sys_std_encoding(encoding): # Load

Re: [Python-Dev] SWIG and rlcompleter

2005-08-18 Thread M.-A. Lemburg
James Y Knight wrote: On Aug 17, 2005, at 2:55 PM, Timothy Fitz wrote: On 8/16/05, Raymond Hettinger [EMAIL PROTECTED] wrote: -0 The behavior of dir() already a bit magical. Python is much simpler to comprehend if we have direct relationships like dir() and vars() corresponding as

Re: [Python-Dev] Revised PEP 349: Allow str() to return unicode strings

2005-08-23 Thread M.-A. Lemburg
Thomas Heller wrote: Neil Schemenauer [EMAIL PROTECTED] writes: [Please mail followups to [EMAIL PROTECTED] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is feasible. It would be

Re: [Python-Dev] 51 Million calls to _PyUnicodeUCS2_IsLinebreak() (???)

2005-08-24 Thread M.-A. Lemburg
Walter Dörwald wrote: I wonder if we should switch back to a simple readline() implementation for those codecs that don't require the current implementation (basically every charmap codec). That would be my preference as well. The 2.4 .readline() approach is really only needed for codecs

Re: [Python-Dev] 51 Million calls to _PyUnicodeUCS2_IsLinebreak() (???)

2005-08-24 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: I think it's worthwhile reconsidering this approach for character type queries that do no involve a huge number of code points. I would advise against that. I measure both versions (your version called PyUnicode_IsLinebreak2) with the following

Re: [Python-Dev] Style for raising exceptions (python-dev Summary for 2005-08-01 through 2005-08-15 [draft])

2005-08-25 Thread M.-A. Lemburg
I must have missed this one: Style for raising exceptions Guido explained that these days exceptions should always be raised as:: raise SomeException(some argument) instead of:: raise SomeException, some argument

Re: [Python-Dev] Remove str.find in 3.0?

2005-08-28 Thread M.-A. Lemburg
Raymond Hettinger wrote: [Guido] Another observation: despite the derogatory remarks about regular expressions, they have one thing going for them: they provide a higher level of abstraction for string parsing, which this is all about. (They are higher level in that you don't have to be

Re: [Python-Dev] Mapping Darwin 8.2.0 to Mac OS X 10.4.2 in platform.py

2005-09-22 Thread M.-A. Lemburg
Ronald Oussoren wrote: On 22-sep-2005, at 5:26, Guido van Rossum wrote: The platform module has a way to map system names such as returned by uname() to marketing names. It maps SunOS to Solaris, for example. But it doesn't map Darwin to Mac OS X. I think I know how to map Darwin version

Re: [Python-Dev] Mapping Darwin 8.2.0 to Mac OS X 10.4.2 in platform.py

2005-09-29 Thread M.-A. Lemburg
Bob Ippolito wrote: /usr/bin/sw_vers technically calls a private (at least undocumented) CoreFoundation API, it doesn't parse that plist directly :) On further inspection, it looks like parsing the plist directly is supported API these days (see the bottom of http://

Re: [Python-Dev] C API doc fix

2005-09-29 Thread M.-A. Lemburg
Steven Bethard wrote: On 9/29/05, Robey Pointer [EMAIL PROTECTED] wrote: Yesterday I ran into a bug in the C API docs. The top of this page: http://docs.python.org/api/unicodeObjects.html says: Py_UNICODE This type represents a 16-bit unsigned storage type which is used by Python

Re: [Python-Dev] C API doc fix

2005-09-29 Thread M.-A. Lemburg
Bob Ippolito wrote: On Sep 29, 2005, at 3:53 PM, M.-A. Lemburg wrote: Perhaps a flag that fires up Python and runs platform.py would help too. python -mplatform Cool :-) Now we only need to add some more information to it (like e.g. the Unicode variant). -- Marc-Andre Lemburg

Re: [Python-Dev] C API doc fix

2005-09-29 Thread M.-A. Lemburg
Fredrik Lundh wrote: M.-A. Lemburg wrote: * Unicode variant (UCS2, UCS4) don't forget the Py_UNICODE is wchar_t subvariant. True, but that's not relevant for binary compatibility of Python package (at least not AFAIK). UCS2 vs. UCS4 matters because the two versions use and expose

Re: [Python-Dev] --disable-unicode (Tests and unicode)

2005-10-03 Thread M.-A. Lemburg
Reinhold Birkenfeld wrote: Martin v. Löwis wrote: Whether we think it should be supported depends on who we is, as with all these minor features: some think it is a waste of time, some think it should be supported if reasonably possible, and some think this a conditio sine qua non. It certainly

Re: [Python-Dev] --disable-unicode (Tests and unicode)

2005-10-03 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: Is the added complexity needed to support not having Unicode support compiled into Python really worth it ? If there are volunteers willing to maintain it, and the other volunteers are not affected: certainly. No objections there. I only see

Re: [Python-Dev] unifying str and unicode

2005-10-03 Thread M.-A. Lemburg
Martin Blais wrote: On 10/3/05, Antoine Pitrou [EMAIL PROTECTED] wrote: If that's how things were designed, then Python's entire standard brary (not to mention third-party libraries) is not unicode safe - to quote your own words - since many functions may return 8-bit strings containing

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-04 Thread M.-A. Lemburg
Walter Dörwald wrote: Am 04.10.2005 um 04:25 schrieb [EMAIL PROTECTED]: As the OP suggests, decoding with a codec like mac-roman or iso8859-1 is very slow compared to encoding or decoding with utf-8. Here I'm working with 53k of data instead of 53 megs. (Note: this is a laptop, so it's

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: Another option would be to generate a big switch statement in C and let the compiler decide about the best data structure. I would try to avoid generating C code at all costs. Maintaining the build processes will just be a nightmare. We could automate this using

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Hye-Shik Chang wrote: On 10/5/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: Of course, a C version could use the same approach as the unicodedatabase module: that of compressed lookup tables... http://aggregate.org/TechPub/lcpc2002.pdf genccodec.py anyone ? I had written a test

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: I would try to avoid generating C code at all costs. Maintaining the build processes will just be a nightmare. We could automate this using distutils; however I'm not sure whether this would then also work on Windows. It wouldn't. Could

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: Walter Dörwald wrote: OK, here's a patch that implements this enhancement to PyUnicode_DecodeCharmap(): http://www.python.org/sf/1313939 Looks nice! Indeed (except for the choice of the map this character to undefined code point). Hye-Shik, could you please provide

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-05 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: It wouldn't. Could you elaborate why not ? Using distutils on Windows is really easy... The current build process for Windows simply doesn't provide it. You expect to select Build/All from the menu (or some such), and expect all code

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-06 Thread M.-A. Lemburg
Hye-Shik Chang wrote: On 10/6/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: Hye-Shik, could you please provide some timeit figures for the fastmap encoding ? Thanks for the timings. (before applying Walter's patch, charmap decoder) % ./python Lib/timeit.py -s s='a'*53*1024; e='iso8859_10

Re: [Python-Dev] Unicode charmap decoders slow

2005-10-14 Thread M.-A. Lemburg
Walter Dörwald wrote: We've already taken care of decoding. What we still need is a new gencodec.py and regenerated codecs. I'll take care of that; just haven't gotten around to it yet. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 14 2005)

Re: [Python-Dev] Questionable AST wibbles

2005-10-21 Thread M.-A. Lemburg
Neal Norwitz wrote: Jeremy, There are a bunch of mods from the AST branch that got integrated into head. Hopefully, by doing this on python-dev more people will get involved. I'll describe high level things first, but there will be a ton of details later on. If people don't want to see

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread M.-A. Lemburg
Neil Hodgson wrote: Guido van Rossum: Folks, please focus on what Python 3000 should do. I'm thinking about making all character strings Unicode (possibly with different internal representations a la NSString in Apple's Objective C) and introduce a separate mutable bytes array data type. But

Re: [Python-Dev] New codecs checked in

2005-10-24 Thread M.-A. Lemburg
Walter Dörwald wrote: Martin v. Löwis wrote: M.-A. Lemburg wrote: I've checked in a whole bunch of newly generated codecs which now make use of the faster charmap decoding variant added by Walter a short while ago. Please let me know if you find any problems. I think we should work

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread M.-A. Lemburg
Bengt Richter wrote: Please bear with me for a few paragraphs ;-) Please note that source code encoding doesn't really have anything to do with the way the interpreter executes the program - it's merely a way to tell the parser how to convert string literals (currently on the Unicode ones) into

Re: [Python-Dev] KOI8_U (New codecs checked in)

2005-10-24 Thread M.-A. Lemburg
Walter Dörwald wrote: Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put a complete decoding_table into koi8_u.py? KOI8-U is not available as mapping on ftp.unicode.org and I only recreated codecs from the mapping files available there. OK, so we'd need something

Re: [Python-Dev] KOI8_U (New codecs checked in)

2005-10-24 Thread M.-A. Lemburg
M.-A. Lemburg wrote: Walter Dörwald wrote: Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put a complete decoding_table into koi8_u.py? KOI8-U is not available as mapping on ftp.unicode.org and I only recreated codecs from the mapping files available there. OK, so

Re: [Python-Dev] New codecs checked in

2005-10-24 Thread M.-A. Lemburg
Walter Dörwald wrote: I'd like to suggest a small cosmetic change: gencodec.py should output byte values with two hexdigits instead of four. This makes it easier to see what is a byte values and what is a codepoint. And it would make grepping for stuff simpler. True. I'll rerun the creation with

Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

2005-10-24 Thread M.-A. Lemburg
Guido van Rossum wrote: On 10/24/05, Phil Thompson [EMAIL PROTECTED] wrote: I'm implementing a string-like object in an extension module and trying to make it as interoperable with the standard string object as possible. To do this I'm implementing the relevant slots and the buffer interface.

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread M.-A. Lemburg
Neil Hodgson wrote: M.-A. Lemburg: Unicode has the concept of combining code points, e.g. you can store an é (e with a accent) as e + '. Now if you slice off the accent, you'll break the character that you encoded using combining code points. ... next_indextype(u, index) - integer

Re: [Python-Dev] New codecs checked in

2005-10-25 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: I just left them in because I thought they wouldn't do any harm and might be useful in some applications. Removing them where not directly needed by the codec would not be a problem. I think memory usage caused is measurable (I estimated 4KiB

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread M.-A. Lemburg
Bengt Richter wrote: At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote: Bengt Richter wrote: Please bear with me for a few paragraphs ;-) Please note that source code encoding doesn't really have anything to do with the way the interpreter executes the program - it's merely a way to tell

Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread M.-A. Lemburg
Fredrik Lundh wrote: M.-A. Lemburg wrote: I don't follow you here. The source code encoding is only applied to Unicode literals (you are using string literals in your example). String literals are passed through as-is. however, for Python 3000, it would be nice if the source-code

Re: [Python-Dev] New codecs checked in

2005-10-25 Thread M.-A. Lemburg
M.-A. Lemburg wrote: Martin v. Löwis wrote: M.-A. Lemburg wrote: I had to create three custom mapping files for cp1140, koi8-u and tis-620. Can you please publish the files you have used somewhere? They best go into the Python CVS. Sure; I'll check in the whole build machinery I'm

Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread M.-A. Lemburg
Josiah Carlson wrote: Martin v. Löwis [EMAIL PROTECTED] wrote: Fredrik Lundh wrote: however, for Python 3000, it would be nice if the source-code encoding applied to the *entire* file (XML-style), rather than just unicode string literals and (hope- fully) comments and docstrings. As MAL

Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-26 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: A few years ago we had a discussion about this on python-dev and agreed to stick with ASCII identifiers for Python. I still think that's the right way to go. I don't think there ever was such an agreement. You even argued against having non-ASCII

Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-27 Thread M.-A. Lemburg
Martin v. Löwis wrote: M.-A. Lemburg wrote: You even argued against having non-ASCII identifiers: http://mail.python.org/pipermail/python-list/2002-May/102936.html I see :-) It seems I have changed my mind since then (which apparently predates PEP 263). One issue I apparently

Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-27 Thread M.-A. Lemburg
Greg Ewing wrote: M.-A. Lemburg wrote: If you are told to debug a program written by say a Japanese programmer using Japanese identifiers you are going to have a really hard time. Or you could look upon it as an opportunity to broaden your mental horizons by learning some Japanese

Re: [Python-Dev] i18n identifiers

2005-10-31 Thread M.-A. Lemburg
Martin v. Löwis wrote: Steve Holden wrote: Therefore, if such steps are really going to be considered, I would really like to see them introduced in such a way that no breakage occurs for existing users, even the parochial ones who feel they (and their programs) don't need to understand

Re: [Python-Dev] PEP 328 - absolute imports (python-dev sprint at PyCon)

2005-11-01 Thread M.-A. Lemburg
Guido van Rossum wrote: On 11/1/05, Phillip J. Eby [EMAIL PROTECTED] wrote: At 10:22 AM 11/1/2005 -0700, Guido van Rossum wrote: * PEP 328 - absolute/relative import I assume that references to 2.4 in that PEP should be changed to 2.5, and so on. For the part that hasn't been implemented

Re: [Python-Dev] Adding examples to PEP 263

2005-11-04 Thread M.-A. Lemburg
Fredrik Lundh wrote: the runtime warning you get when you use non-ascii characters in python source code points the poor user to this page: http://www.python.org/peps/pep-0263.html which tells the user to add a # -*- coding: encoding name -*- to the source, and then provides

Re: [Python-Dev] str.dedent

2005-11-13 Thread M.-A. Lemburg
Noam Raphael wrote: Following Avi's suggestion, can I raise this thread up again? I think that Reinhold's .dedent() method can be a good idea after all. The idea is to add a method called dedent to strings. It would do exactly what the current textwrap.indent function does. You are missing

Re: [Python-Dev] registering unicode codecs

2005-11-24 Thread M.-A. Lemburg
Neal Norwitz wrote: While running regrtest with -R to find reference leaks I found a usage issue. When a codec is registered it is stored in the interpreter state and cannot be removed. Since it is stored as a list, if you repeated add the same search function, you will get duplicates in the

Re: [Python-Dev] registering unicode codecs

2005-11-24 Thread M.-A. Lemburg
Neal Norwitz wrote: On 11/24/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: Should users have access to the search path (through a codecs.unregister())? Maybe, but why would you want to unregister a search function ? If so, should it search from the end of the list to the beginning to remove

  1   2   3   4   5   6   7   8   9   10   >