Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Neil Hodgson wrote: I'd like to more tightly define Unicode strings for Python 3000. Currently, Unicode strings may be implemented with either 2 byte (UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to contain any Unicode character and should be indexable yielding characters rather than half characters. Therefore Python strings should appear to be UTF-32. There could still be multiple implementations (using UTF-16 or UTF-8) to preserve space but all implementations should appear to be the same apart from speed and memory use. That's very tricky. If you have multiple implementations, you make usage at the C API difficult. If you make it either UTF-8 or UTF-32, you make PythonWin difficult. If you make it UTF-16, you make indexing difficult. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Phillip J. Eby wrote: I'm tempted to say it would be even better if there was a command line option that could be used to force all binary opens to result in bytes, and require all text opens to specify an encoding. For Python 3000? -1. There shouldn't be command line switches that have that much importance. For Python 2.x? Well, we are not supposed to discuss this. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Martin v. Löwis: That's very tricky. If you have multiple implementations, you make usage at the C API difficult. If you make it either UTF-8 or UTF-32, you make PythonWin difficult. If you make it UTF-16, you make indexing difficult. For Windows, the code will get a little uglier, needing to perform an allocation/encoding and deallocation more often then at present but I don't think there will be a speed degradation as Windows is currently performing a conversion from 8 bit to UTF-16 inside many system calls. To minimize the cost of allocation, Python could copy Windows in keeping a small number of commonly sized preallocated buffers handy. For indexing UTF-16, a flag could be set to show if the string is all in the base plane and if not, an index could be constructed when and if needed. It'd be good to get some feel for what proportion of string operations performed require indexing. Many, such as startswith, split, and concatenation don't require indexing. The proportion of operations that use indexing to scan strings would also be interesting as adding a (currentIndex, currentOffset) cursor to string objects would be another approach. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Definining properties - a use case for class decorators?
On 10/23/05, Nick Coghlan [EMAIL PROTECTED] wrote: Very nice indeed. I'd be more supportive if it was defined as a new statement such as create with the syntax: create TYPE NAME(ARGS): BLOCK I like it, but it would require a new keyword. Alternatively, one could abuse 'def': def TYPE NAME(ARGS): BLOCK but then people would likely be confused as Skip was, earlier in this thread, so I guess 'def' is a not an option. IMHO a new keyword could be justified for such a powerful feature, but only Guido's opinion counts on this matters ;) Anyway I expected people to criticize the proposal as too powerful and dangerously close to Lisp macros. Michele Simionato ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] int(string)
Alan McIntyre wrote: When running make test I get some errors in test_array and test_compile that did not occur in the build from CVS. Given the inputs to long() have '.' characters in them, I assume that these tests really should be failing as implemented, but I haven't dug into them to see what's going on: == ERROR: test_repr (__main__.FloatTest) -- Traceback (most recent call last): File Lib/test/test_array.py, line 187, in test_repr self.assertEqual(a, eval(repr(a), {array: array.array})) ValueError: invalid literal for long(): 100.0 == ERROR: test_repr (__main__.DoubleTest) -- Traceback (most recent call last): File Lib/test/test_array.py, line 187, in test_repr self.assertEqual(a, eval(repr(a), {array: array.array})) ValueError: invalid literal for long(): 100.0 I don't have the latest cvs, but in my copy of test_array, the input to those two eval calls are array('f', [-42.0, 0.0, 42.0, 10.0, -100.0, -42.0, 0.0, 42.0, 10.0, -100.0]) and array('d', [-42.0, 0.0, 42.0, 10.0, -100.0, -42.0, 0.0, 42.0, 10.0, -100.0]) respectively. if either of those gives invalid literal for long, something's seriously broken. does a plain a = -100.0 still work on your machine? /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Definining properties - a use case for class decorators?
Michele Simionato [EMAIL PROTECTED] wrote: On 10/23/05, Nick Coghlan [EMAIL PROTECTED] wrote: Very nice indeed. I'd be more supportive if it was defined as a new statement such as create with the syntax: create TYPE NAME(ARGS): BLOCK I like it, but it would require a new keyword. Alternatively, one could abuse 'def': def TYPE NAME(ARGS): BLOCK but then people would likely be confused as Skip was, earlier in this thread, so I guess 'def' is a not an option. IMHO a new keyword could be justified for such a powerful feature, but only Guido's opinion counts on this matters ;) Anyway I expected people to criticize the proposal as too powerful and dangerously close to Lisp macros. I would criticise it for being dangerously close to worthless. With the minor support code that I (and others) have offered, no new syntax is necessary. You can get the same semantics with... class NAME(_(TYPE), ARGS): BLOCK And a suitably defined _. Remember, not every X line function should be made a builtin or syntax. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Definining properties - a use case for class decorators?
On 10/24/05, Josiah Carlson [EMAIL PROTECTED] wrote: I would criticise it for being dangerously close to worthless. With the minor support code that I (and others) have offered, no new syntax is necessary. You can get the same semantics with... class NAME(_(TYPE), ARGS): BLOCK And a suitably defined _. Remember, not every X line function should be made a builtin or syntax. - Josiah Could you re-read my original message, please? Sugar is *everything* in this case. If the functionality is to be implemented via a __metaclass__ hook, then it should be considered a hack that nobody in his right mind should use. OTOH, if there is a specific syntax for it, then it means this the usage has the benediction of the BDFL. This would be a HUGE change. For instance, I would never abuse metaclasses for that, whereas I would freely use a 'create' statement. Michele Simionato ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Neil Hodgson wrote: Guido van Rossum: Folks, please focus on what Python 3000 should do. I'm thinking about making all character strings Unicode (possibly with different internal representations a la NSString in Apple's Objective C) and introduce a separate mutable bytes array data type. But I could use some validation or feedback on this idea from actual practitioners. I'd like to more tightly define Unicode strings for Python 3000. Currently, Unicode strings may be implemented with either 2 byte (UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to contain any Unicode character and should be indexable yielding characters rather than half characters. Therefore Python strings should appear to be UTF-32. There could still be multiple implementations (using UTF-16 or UTF-8) to preserve space but all implementations should appear to be the same apart from speed and memory use. There seems to be a general misunderstanding here: even if you have UCS4 storage, it is still possible to slice a Unicode string in a way which makes rendering it correctly. Unicode has the concept of combining code points, e.g. you can store an é (e with a accent) as e + '. Now if you slice off the accent, you'll break the character that you encoded using combining code points. Note that combining code points are rather common in encodings of Asian scripts, so this is not an artificial example. Some time ago I proposed a new module called unicodeindex to help with indexing. It would solve most of the indexing issues you run into when dealing with Unicode. I've attached it to this email for reference. More on the used terms: http://www.egenix.com/files/python/EuroPython2002-Python-and-Unicode.pdf http://www.egenix.com/files/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 24 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! PEP: 0XXX Title: Unicode Indexing Helper Module Version: $Revision: 1.0 $ Author: [EMAIL PROTECTED] (Marc-Andr Lemburg) Status: Draft Type: Standards Track Python-Version: 2.3 Created: 06-Jun-2001 Post-History: Abstract This PEP proposes a new module unicodeindex which provides means to index Unicode objects in various higher level abstractions of characters. Problem and Terminology Unicode objects can be indexed just like string object using what in Unicode terms is called a code unit as index basis. Code units are the storage entities used by the Unicode implementation to store a single Unicode information unit and do not necessarily map 1-1 to code points which are the smallest entities encoded by the Unicode standard. Python exposes code units to the programmer via the Unicode object indexing and slicing API, e.g. u[10] or u[12:15] refer to the code units at index 10 and indices 12 to 14. These code points can sometimes be composed to form graphemes which are then displayed by the Unicode output device as one character. A word is then a sequence of characters separated by space characters or punctuation, a line is a sequence of code points separated by line breaking code point sequences. For addressing Unicode, there are basically five different methods by which you can reference the data: 1. per code unit(codeunit) 2. per code point (codepoint) 3. per grapheme (grapheme) 4. per word (word) 5. per line (line) The indexing type name is given in parenthesis and used in the module interface. Proposed Solution I propose to add a new module to the standard Python library which provides interfaces implementing the above indexing methods. Module Interface The module should provide the following interfaces for all four indexing styles: next_indextype(u, index) - integer Returns the Unicode object index for the start of the next indextype found after u[index] or -1 in case no next element of this type exists. prev_indextype(u, index) - integer Returns the Unicode object index for the start of the previous indextype found before u[index] or -1 in case no previous element of this type exists. indextype_index(u, n) - integer Returns the Unicode object index for the start of the n-th indextype element in u. Raises an IndexError in case no n-th element can be found. indextype_count(u, index) - integer Counts the number of complete indextype elements found in u[:index] and returns the count
Re: [Python-Dev] New codecs checked in
Martin v. Löwis wrote: M.-A. Lemburg wrote: I've checked in a whole bunch of newly generated codecs which now make use of the faster charmap decoding variant added by Walter a short while ago. Please let me know if you find any problems. I think we should work on eliminating the decoding_map variables. There are some codecs which rely on them being present in other codecs (e.g. koi8_u.py is based on koi8_r.py); however, this could be updated to use, say decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, { 0x00a4: 0x0454, # CYRILLIC SMALL LETTER UKRAINIAN IE 0x00a6: 0x0456, # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I 0x00a7: 0x0457, # CYRILLIC SMALL LETTER YI (UKRAINIAN) 0x00ad: 0x0491, # CYRILLIC SMALL LETTER UKRAINIAN GHE WITH UPTURN 0x00b4: 0x0404, # CYRILLIC CAPITAL LETTER UKRAINIAN IE 0x00b6: 0x0406, # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I 0x00b7: 0x0407, # CYRILLIC CAPITAL LETTER YI (UKRAINIAN) 0x00bd: 0x0490, # CYRILLIC CAPITAL LETTER UKRAINIAN GHE WITH UPTURN }) With all these cross-references gone, the decoding_maps could also go. Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put a complete decoding_table into koi8_u.py? I'd like to suggest a small cosmetic change: gencodec.py should output byte values with two hexdigits instead of four. This makes it easier to see what is a byte values and what is a codepoint. And it would make grepping for stuff simpler. I.e. change: decoding_map.update({ 0x0080: 0x0402, # CYRILLIC CAPITAL LETTER DJE to decoding_map.update({ 0x80: 0x0402, # CYRILLIC CAPITAL LETTER DJE and decoding_table = ( u'\x00' # 0x - NULL to decoding_table = ( u'\x00' # 0x00 - U+ NULL and encoding_map = { 0x: 0x, # NULL to encoding_map = { 0x: 0x00, # NULL ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New codecs checked in
Walter Dörwald wrote: Martin v. Löwis wrote: M.-A. Lemburg wrote: I've checked in a whole bunch of newly generated codecs which now make use of the faster charmap decoding variant added by Walter a short while ago. Please let me know if you find any problems. I think we should work on eliminating the decoding_map variables. There are some codecs which rely on them being present in other codecs (e.g. koi8_u.py is based on koi8_r.py); however, this could be updated to use, say decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, { 0x00a4: 0x0454, # CYRILLIC SMALL LETTER UKRAINIAN IE 0x00a6: 0x0456, # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I 0x00a7: 0x0457, # CYRILLIC SMALL LETTER YI (UKRAINIAN) 0x00ad: 0x0491, # CYRILLIC SMALL LETTER UKRAINIAN GHE WITH UPTURN 0x00b4: 0x0404, # CYRILLIC CAPITAL LETTER UKRAINIAN IE 0x00b6: 0x0406, # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I 0x00b7: 0x0407, # CYRILLIC CAPITAL LETTER YI (UKRAINIAN) 0x00bd: 0x0490, # CYRILLIC CAPITAL LETTER UKRAINIAN GHE WITH UPTURN }) With all these cross-references gone, the decoding_maps could also go. I just left them in because I thought they wouldn't do any harm and might be useful in some applications. Removing them where not directly needed by the codec would not be a problem. Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put a complete decoding_table into koi8_u.py? KOI8-U is not available as mapping on ftp.unicode.org and I only recreated codecs from the mapping files available there. I'd like to suggest a small cosmetic change: gencodec.py should output byte values with two hexdigits instead of four. This makes it easier to see what is a byte values and what is a codepoint. And it would make grepping for stuff simpler. True. I'll rerun the creation with the above changes sometime this week. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 24 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Definining properties - a use case for class decorators?
Josiah Carlson wrote: You can get the same semantics with... class NAME(_(TYPE), ARGS): BLOCK And a suitably defined _. Remember, not every X line function should be made a builtin or syntax. And this would be an extremely fragile hack that is entirely dependent on the murky rules regarding how Python chooses the metaclass for the newly created class. Ensuring that the metaclass of the class returned by _ was always the one chosen would be tricky at best and impossible at worst. Even if it *could* be done, I'd never want to see a hack like that in production code I had anything to do with. And while writing it with __metaclass__ has precisely the correct semantics, that simply isn't as readable as a new block statement would be, nor is it as readable as the current major alternatives (e.g., defining and invoking a factory function). An alternative to a completely new function would be to simply allow the metaclass to be defined up front, rather than inside the body of the class statement: class @TYPE NAME(ARGS): BLOCK For example: class @Property x(): def get(self): return self._x def set(self, value): self._x = value def delete(self, value): del self._x (I put the metaclass after the keyword, because, unlike a function decorator, the metaclass is invoked *before* the class is created, and because you're only allowed one explicit metaclass) Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.blogspot.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Bengt Richter wrote: Please bear with me for a few paragraphs ;-) Please note that source code encoding doesn't really have anything to do with the way the interpreter executes the program - it's merely a way to tell the parser how to convert string literals (currently on the Unicode ones) into constant Unicode objects within the program text. It's also a nice way to let other people know what kind of encoding you used to write your comments ;-) Nothing more. Once a module is compiled, there's no distinction between a module using the latin-1 source code encoding or one using the utf-8 encoding. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 24 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
Barry Warsaw wrote: I've had this PEP laying around for quite a few months. It was inspired by some code we'd written which wanted to be able to get immutable versions of arbitrary objects. I've finally finished the PEP, uploaded a sample patch (albeit a bit incomplete), and I'm posting it here to see if there is any interest. http://www.python.org/peps/pep-0351.html I think it's definitely worth considering. It may also reduce the need for x and frozenx builtin pairs. We already have set and frozenset, and the various bytes ideas that have been kicked around have generally considered the need for a frozenbytes as well. If freeze was available, then freeze(x(*args)) might server as a replacement for any builtin frozen variants. I think having dicts and sets automatically invoke freeze would be a mistake, because at least one of the following two cases would behave unexpectedly: d = {} l = [] d[l] = Oops! d[l] # Raises KeyError if freeze() isn't also invoked in __getitem__ d = {} l = [] d[l] = Oops! l.append(1) d[l] # Raises KeyError regardless Oh, and the PEP's xdict example is even more broken than the PEP implies, because two imdicts which compare equal (same contents) may not hash equal (different id's). Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.blogspot.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
On 10/24/05, Josiah Carlson [EMAIL PROTECTED] wrote: Should dicts and sets automatically freeze their mutable keys? Dictionaries don't have mutable keys, Since when? class Foo: def __init__(self): self.x = 1 f = Foo() d = {f: 1} f.x = 2 Maybe you meant something else? I can't think of any way in which dictionaries don't have mutable keys is true. The only rule about dictionary keys that I know of is that they need to be hashable and need to be comparable with the equality operator. -- Twisted | Christopher Armstrong: International Man of Twistery Radix|-- http://radix.twistedmatrix.com | Release Manager, Twisted Project \\\V/// |-- http://twistedmatrix.com |o O|| wvw-+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
Christopher Armstrong [EMAIL PROTECTED] wrote: On 10/24/05, Josiah Carlson [EMAIL PROTECTED] wrote: Should dicts and sets automatically freeze their mutable keys? Dictionaries don't have mutable keys, Since when? Maybe you meant something else? I can't think of any way in which dictionaries don't have mutable keys is true. The only rule about dictionary keys that I know of is that they need to be hashable and need to be comparable with the equality operator. Good point, I forgot about user-defined classes (I rarely use them as keys myself, it's all too easy to make a mutable whose hash is dependant on mutable contents, as having an object which you can only find if you have the exact object is not quite as useful I generally need). I will, however, stand by, a container which is frozen should have its contents frozen as well. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New codecs checked in
M.-A. Lemburg wrote: Walter Dörwald wrote: Martin v. Löwis wrote: M.-A. Lemburg wrote: I've checked in a whole bunch of newly generated codecs which now make use of the faster charmap decoding variant added by Walter a short while ago. Please let me know if you find any problems. I think we should work on eliminating the decoding_map variables. There are some codecs which rely on them being present in other codecs (e.g. koi8_u.py is based on koi8_r.py); however, this could be updated to use, say decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, { 0x00a4: 0x0454, # CYRILLIC SMALL LETTER UKRAINIAN IE 0x00a6: 0x0456, # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I 0x00a7: 0x0457, # CYRILLIC SMALL LETTER YI (UKRAINIAN) 0x00ad: 0x0491, # CYRILLIC SMALL LETTER UKRAINIAN GHE WITH UPTURN 0x00b4: 0x0404, # CYRILLIC CAPITAL LETTER UKRAINIAN IE 0x00b6: 0x0406, # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I 0x00b7: 0x0407, # CYRILLIC CAPITAL LETTER YI (UKRAINIAN) 0x00bd: 0x0490, # CYRILLIC CAPITAL LETTER UKRAINIAN GHE WITH UPTURN }) With all these cross-references gone, the decoding_maps could also go. I just left them in because I thought they wouldn't do any harm and might be useful in some applications. Removing them where not directly needed by the codec would not be a problem. Recreating them is quite simple via dict(enumerate(decoding_table)) so I think we should remove them. Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put a complete decoding_table into koi8_u.py? KOI8-U is not available as mapping on ftp.unicode.org and I only recreated codecs from the mapping files available there. OK, so we'd need something that creates a new decoding table from an old one + changes, i.e. something like: def update_decoding_table(table, new): table = list[table] for (key, value) in new.iteritems(): table[key] = unichr(value) return u.join(table) I'd like to suggest a small cosmetic change: gencodec.py should output byte values with two hexdigits instead of four. This makes it easier to see what is a byte values and what is a codepoint. And it would make grepping for stuff simpler. True. I'll rerun the creation with the above changes sometime this week. Great, thanks! Bye, Walter Dörwald ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Definining properties - a use case for class decorators?
Nick Coghlan [EMAIL PROTECTED] writes: Josiah Carlson wrote: You can get the same semantics with... class NAME(_(TYPE), ARGS): BLOCK And a suitably defined _. Remember, not every X line function should be made a builtin or syntax. And this would be an extremely fragile hack that is entirely dependent on the murky rules regarding how Python chooses the metaclass for the newly created class. Uh, not really. In the presence of base classes it's always the type of the first base. The reason it might not seem this simple is that most metaclasses end up calling type.__new__ at some point and this function does more complicated things (such as checking for metaclass conflict and deferring to the most specific metaclass). Not sure what the context is here, but I have to butt in when I see people complicating things which aren't actually that complicated... Cheers, mwh -- There's an aura of unholy black magic about CLISP. It works, but I have no idea how it does it. I suspect there's a goat involved somewhere. -- Johann Hibschman, comp.lang.scheme ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
Nick Coghlan [EMAIL PROTECTED] wrote: I think having dicts and sets automatically invoke freeze would be a mistake, because at least one of the following two cases would behave unexpectedly: I'm pretty sure that the PEP was only aslomg if one would freeze the contents of dicts IF the dict was being frozen. That is, which of the following should be the case: freeze({1:[2,3,4]}) - {1:[2,3,4]} freeze({1:[2,3,4]}) - xdict(1=(2,3,4)) - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Definining properties - a use case for class decorators?
Michele Simionato [EMAIL PROTECTED] wrote: On 10/24/05, Josiah Carlson [EMAIL PROTECTED] wrote: I would criticise it for being dangerously close to worthless. With the minor support code that I (and others) have offered, no new syntax is necessary. You can get the same semantics with... class NAME(_(TYPE), ARGS): BLOCK And a suitably defined _. Remember, not every X line function should be made a builtin or syntax. - Josiah Could you re-read my original message, please? Sugar is *everything* in this case. If the functionality is to be implemented via a __metaclass__ hook, then it should be considered a hack that nobody in his right mind should use. OTOH, if there is a specific syntax for it, then it means this the usage has the benediction of the BDFL. This would be a HUGE change. For instance, I would never abuse metaclasses for that, whereas I would freely use a 'create' statement. Metaclass abuse? Oh, I'm sorry, I thought that the point of metaclasses were to offer a way to make magic happen in a somewhat pragmatic manner, you know, through metaprogramming. I would call this particular use a practical application of standard Python semantics. Pardon me while I attempt to re-parse your above statement... If there is a specific syntax for [passing a temporary namespace to a callable, created by some sort of block mechanism], then [using it for property creation] has the benediction of the BDFL. What I'm trying to say is that it already has a no-syntax syntax. It uses the magic of metaclasses, but one can make that magic as explicit as necessary. class NAME(PassNamespaceFromClassBlock(fcn=TYPE, args=ARGS)): BLOCK Personally, I've not seen the desire to pass temporary namespaces to functions until recently, so whether or not people will use it for property creation, or any other way that people would find interesting and/or useful, is at least a bit of prediction. Maybe people will prefer to use property('get_foo', 'set_foo', 'del_foo'), who knows? But you know what? Regardless of what people want, they can use metaclasses right now to create properties, where they would have to wait until Python 2.5 comes out before they could use this proposed 'create' statement. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Definining properties - a use case for class decorators?
On 24-okt-2005, at 12:54, Josiah Carlson wrote: Metaclass abuse? Oh, I'm sorry, I thought that the point of metaclasses were to offer a way to make magic happen in a somewhat pragmatic manner, you know, through metaprogramming. I would call this particular use a practical application of standard Python semantics. I'd say using a class statement to define a property is metaclass abuse, as would anything that wouldn't define something class-like. The same is true for other constructs, using an decorator to define something that is not a callable would IMHO also be abuse. That said, I really have an opinion on the 'create' statement proposal yet. It does seem to have a very limited field of use. I'm quite happy with using property as it is, property('get_foo', 'set_foo') would take away most if not all of the remaining problems. Ronald ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] KOI8_U (New codecs checked in)
Walter Dörwald wrote: Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put a complete decoding_table into koi8_u.py? KOI8-U is not available as mapping on ftp.unicode.org and I only recreated codecs from the mapping files available there. OK, so we'd need something that creates a new decoding table from an old one + changes, i.e. something like: def update_decoding_table(table, new): table = list[table] for (key, value) in new.iteritems(): table[key] = unichr(value) return u.join(table) Actually, I'd rather have some official mapping files for these. Perhaps we could get someone to upload a mapping file for KOI8_U to the Unicode site ?! The mapping is defined in RFC2319: http://www.faqs.org/rfcs/rfc2319.html I've put Alexander Yeremenko, the coordinator of the KOI8-U group on CC. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 24 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] KOI8_U (New codecs checked in)
M.-A. Lemburg wrote: Walter Dörwald wrote: Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put a complete decoding_table into koi8_u.py? KOI8-U is not available as mapping on ftp.unicode.org and I only recreated codecs from the mapping files available there. OK, so we'd need something that creates a new decoding table from an old one + changes, i.e. something like: def update_decoding_table(table, new): table = list[table] for (key, value) in new.iteritems(): table[key] = unichr(value) return u.join(table) Actually, I'd rather have some official mapping files for these. Perhaps we could get someone to upload a mapping file for KOI8_U to the Unicode site ?! The mapping is defined in RFC2319: http://www.faqs.org/rfcs/rfc2319.html I've put Alexander Yeremenko, the coordinator of the KOI8-U group on CC. Hmm, that email address bounces. I've now put Maxim on CC: Maxim Dzumanenko [EMAIL PROTECTED] Here's a mapping file for KOI9-U - please check whether it's correct. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 24 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! # # Name: KOI8-U (RFC2319) to Unicode # # See RFC2319 for details. This encoding is a modified KOI8-R # encoding. # 0x000x # NULL 0x010x0001 # START OF HEADING 0x020x0002 # START OF TEXT 0x030x0003 # END OF TEXT 0x040x0004 # END OF TRANSMISSION 0x050x0005 # ENQUIRY 0x060x0006 # ACKNOWLEDGE 0x070x0007 # BELL 0x080x0008 # BACKSPACE 0x090x0009 # HORIZONTAL TABULATION 0x0A0x000A # LINE FEED 0x0B0x000B # VERTICAL TABULATION 0x0C0x000C # FORM FEED 0x0D0x000D # CARRIAGE RETURN 0x0E0x000E # SHIFT OUT 0x0F0x000F # SHIFT IN 0x100x0010 # DATA LINK ESCAPE 0x110x0011 # DEVICE CONTROL ONE 0x120x0012 # DEVICE CONTROL TWO 0x130x0013 # DEVICE CONTROL THREE 0x140x0014 # DEVICE CONTROL FOUR 0x150x0015 # NEGATIVE ACKNOWLEDGE 0x160x0016 # SYNCHRONOUS IDLE 0x170x0017 # END OF TRANSMISSION BLOCK 0x180x0018 # CANCEL 0x190x0019 # END OF MEDIUM 0x1A0x001A # SUBSTITUTE 0x1B0x001B # ESCAPE 0x1C0x001C # FILE SEPARATOR 0x1D0x001D # GROUP SEPARATOR 0x1E0x001E # RECORD SEPARATOR 0x1F0x001F # UNIT SEPARATOR 0x200x0020 # SPACE 0x210x0021 # EXCLAMATION MARK 0x220x0022 # QUOTATION MARK 0x230x0023 # NUMBER SIGN 0x240x0024 # DOLLAR SIGN 0x250x0025 # PERCENT SIGN 0x260x0026 # AMPERSAND 0x270x0027 # APOSTROPHE 0x280x0028 # LEFT PARENTHESIS 0x290x0029 # RIGHT PARENTHESIS 0x2A0x002A # ASTERISK 0x2B0x002B # PLUS SIGN 0x2C0x002C # COMMA 0x2D0x002D # HYPHEN-MINUS 0x2E0x002E # FULL STOP 0x2F0x002F # SOLIDUS 0x300x0030 # DIGIT ZERO 0x310x0031 # DIGIT ONE 0x320x0032 # DIGIT TWO 0x330x0033 # DIGIT THREE 0x340x0034 # DIGIT FOUR 0x350x0035 # DIGIT FIVE 0x360x0036 # DIGIT SIX 0x370x0037 # DIGIT SEVEN 0x380x0038 # DIGIT EIGHT 0x390x0039 # DIGIT NINE 0x3A0x003A # COLON 0x3B0x003B # SEMICOLON 0x3C0x003C # LESS-THAN SIGN 0x3D0x003D # EQUALS SIGN 0x3E0x003E # GREATER-THAN SIGN 0x3F0x003F # QUESTION MARK 0x400x0040 # COMMERCIAL AT 0x410x0041 # LATIN CAPITAL LETTER A 0x420x0042 # LATIN CAPITAL LETTER B 0x430x0043 # LATIN CAPITAL LETTER C 0x440x0044 # LATIN CAPITAL LETTER D 0x450x0045 # LATIN CAPITAL LETTER E 0x460x0046 # LATIN CAPITAL LETTER F 0x470x0047 # LATIN CAPITAL LETTER G 0x480x0048 # LATIN CAPITAL LETTER H 0x490x0049 # LATIN CAPITAL LETTER I 0x4A0x004A # LATIN CAPITAL LETTER J 0x4B0x004B # LATIN CAPITAL LETTER K 0x4C0x004C # LATIN CAPITAL LETTER L 0x4D0x004D # LATIN CAPITAL LETTER M 0x4E0x004E # LATIN CAPITAL LETTER N 0x4F0x004F # LATIN CAPITAL LETTER O 0x500x0050 # LATIN CAPITAL LETTER P 0x510x0051 # LATIN CAPITAL LETTER Q 0x520x0052 # LATIN CAPITAL LETTER R 0x530x0053 # LATIN CAPITAL LETTER S 0x540x0054 # LATIN
Re: [Python-Dev] Definining properties - a use case for class decorators?
On 10/24/05, Ronald Oussoren [EMAIL PROTECTED] wrote: I'd say using a class statement to define a property is metaclass abuse, as would anything that wouldn't define something class-like. The same is true for other constructs, using an decorator to define something that is not a callable would IMHO also be abuse. +1 That said, I really have an opinion on the 'create' statement proposal yet. It does seem to have a very limited field of use. This is definitely non-true. The 'create' statement would have lots of applications. On top of my mind I can think of 'create' applied to: - bunches; - modules; - interfaces; - properties; - usage in framewors, for instance providing sugar for Object-Relational mappers, for making templates (i.e. a create HTMLPage); - building custom minilanguages; - ... This is way I see a 'create' statement is frightening powerful addition to the language. Michele Simionato ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed resolutions for open PEP 343 issues
Guido van Rossum wrote: Right. That was my point. Nick's worried about undecorated __context__ because he wants to endow generators with a different default __context__. I say no to both proposals and the worries cancel each other out. EIBTI. Works for me. That makes the resolutions for the posted issues: 1. The slot name __context__ will be used instead of __with__ 2. The builtin name context is currently offlimits due to its ambiguity 3a. generator-iterators do NOT have a native context 3b. Use contextmanager as a builtin decorator to get generator-contexts 4. The __context__ slot will NOT be special cased I'll add those into the PEP and reference this thread after Martin is done with the SVN migration. However, those resolutions bring up the following issues: 5 a. What exception is raised when EXPR does not have a __context__ method? b. What about when the returned object is missing __enter__ or __exit__? I suggest raising TypeError in both cases, for symmetry with for loops. The slot check is made in C code, so I don't see any difficulty in raising TypeError instead of AttributeError if the relevant slots aren't filled. 6 a. Should a generic closing context manager be provided? b. If yes, should it be a builtin or in a contexttools module? I'm not too worried about this one for the moment, and it could easily be left out of the PEP itself. Of the sample managers, it seems the most universally useful, though. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.blogspot.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New codecs checked in
Walter Dörwald wrote: I'd like to suggest a small cosmetic change: gencodec.py should output byte values with two hexdigits instead of four. This makes it easier to see what is a byte values and what is a codepoint. And it would make grepping for stuff simpler. True. I'll rerun the creation with the above changes sometime this week. Great, thanks! Done. I had to create three custom mapping files for cp1140, koi8-u and tis-620. If you want more non-standard charmap codecs converted, please send me the mapping files in the Unicode standard format for these files. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 24 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
I'm not sure I understood completely the idea but deriving freeze function from hash gives hash a wider importance. Is __hash__=id inside a class enough to use a set (sets.Set before 2.5) derived class instance as a key to a mapping? Sure I missed the point. Regards Paolino ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed resolutions for open PEP 343 issues
On 10/24/05, Nick Coghlan [EMAIL PROTECTED] wrote: That makes the resolutions for the posted issues: 1. The slot name __context__ will be used instead of __with__ 2. The builtin name context is currently offlimits due to its ambiguity 3a. generator-iterators do NOT have a native context 3b. Use contextmanager as a builtin decorator to get generator-contexts 4. The __context__ slot will NOT be special cased +1 I'll add those into the PEP and reference this thread after Martin is done with the SVN migration. However, those resolutions bring up the following issues: 5 a. What exception is raised when EXPR does not have a __context__ method? b. What about when the returned object is missing __enter__ or __exit__? I suggest raising TypeError in both cases, for symmetry with for loops. The slot check is made in C code, so I don't see any difficulty in raising TypeError instead of AttributeError if the relevant slots aren't filled. Why are you so keen on TypeError? I find AttributeError totally appropriate. I don't see symmetry with for-loops as a valuable property here. AttributeError and TypeError are often interchangeable anyway. 6 a. Should a generic closing context manager be provided? No. Let's provide the minimal mechanisms FIRST. b. If yes, should it be a builtin or in a contexttools module? I'm not too worried about this one for the moment, and it could easily be left out of the PEP itself. Of the sample managers, it seems the most universally useful, though. Let's leave some examples just be examples. I think I'm leaning towards adding __context__ to locks (all types defined in tread or threading, including condition variables), files, and decimal.Context, and leave it at that. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
On Oct 23, 2005, at 6:43 PM, Barry Warsaw wrote: I've had this PEP laying around for quite a few months. It was inspired by some code we'd written which wanted to be able to get immutable versions of arbitrary objects. I've finally finished the PEP, uploaded a sample patch (albeit a bit incomplete), and I'm posting it here to see if there is any interest. http://www.python.org/peps/pep-0351.html I like this. I'd like it better if it integrated with the adapter PEP, so that the freezing mechanism for a given type could be pluggable, and could be provided even if the original object did not contemplate it. I don't know where the adapter PEP stands: skimming through the (most recent?) thread in January didn't give me a clear idea. As another poster mentioned, in-place freezing is also of interest to me (and why I read the PEP Initially), but as also as mentioned that's probably unrelated to your PEP. Gary ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] int(string)
Fredrik Lundh wrote: does a plain a = -100.0 still work on your machine? D'oh - I seriously broke something, then, because it didn't. funny_falcon commented on the patch in SF and suggested a change that took care of that. I've uploaded the corrected version of the patch, which now passes all the tests. Alan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
[Barry Warsaw] I've had this PEP laying around for quite a few months. It was inspired by some code we'd written which wanted to be able to get immutable versions of arbitrary objects. * FWIW, the _as_immutable() protocol was dropped from sets.py for a reason. User reports indicated that it was never helpful in practice. It added complexity and confusion without producing offsetting benefits. * AFAICT, there are no use cases for freezing arbitrary objects when the object types are restricted to just lists and sets but not dicts, arrays, or other containers. Even if the range of supported types were expanded, what applications could use this? Most apps cannot support generic substitution of lists and sets -- they have too few methods in common -- they are almost never interchangeable. * I'm concerned that generic freezing leads to poor design and hard-to-find bugs. One class of bugs results from conflating ordered and unordered collections as lookup keys. It is difficult to assess program correctness when the ordered/unordered distinction has been abstracted away. A second class of errors can arise when the original object mutates and gets out-of-sync with its frozen counterpart. * For a rare app needing mutable lookup keys, a simple recipe would suffice: freeze_pairs = [(list, tuple), (set, frozenset)] def freeze(obj): try: hash(obj) except TypeError: for sourcetype, desttype in freeze_pairs: if isinstance(obj, sourcetype): return desttype(obj) raise else: return obj Unlike the PEP, the recipe works with older pythons and is trivially easy to extend to include other containers. * The name freeze is problematic because it suggests an in-place change. Instead, the proposed mechanism creates a new object. In contrast, explicit conversions like tuple(l) or frozenset(s) are obvious about their running time, space consumed, and new object identity. Overall, I'm -1 on the PEP. Like a bad C macro, the proposed abstraction hides too much. We lose critical distinctions of ordered vs unordered, mutable vs immutable, new objects vs in-place change, etc. Without compelling use cases, the mechanism smells like a hyper-generalization. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
I'm thinking about making all character strings Unicode (possibly with different internal representations a la NSString in Apple's Objective C) and introduce a separate mutable bytes array data type. But I could use some validation or feedback on this idea from actual practitioners. +1 from me, too. I'm tempted to say it would be even better if there was a command line option that could be used to force all binary opens to result in bytes, and require all text opens to specify an encoding. I like this idea, too. Presumably plain open(FILENAME, MODE) would then result in a binary open (no encoding specified), which I've wanted for a long time (and which makes sense). But it is a change. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Python should allow strings to contain any Unicode character and should be indexable yielding characters rather than half characters. Therefore Python strings should appear to be UTF-32. +1. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c
I'm implementing a string-like object in an extension module and trying to make it as interoperable with the standard string object as possible. To do this I'm implementing the relevant slots and the buffer interface. For most things this is fine, but there are a small number of methods in stringobject.c that don't use the buffer interface - and I don't understand why. Specifically... string_contains() doesn't which means that... MyString(foo) in foobar ...doesn't work. s.join(sequence) only allows sequence to contain string or unicode objects. s.strip([chars]) only allows chars to be a string or unicode object. Same for lstrip() and rstrip(). s.ljust(width[, fillchar]) only allows fillchar to be a string object (not even a unicode object). Same for rjust() and center(). Other methods happily allow types that support the buffer interface as well as string and unicode objects. I'm happy to submit a patch - I just wanted to make sure that this behaviour wasn't intentional for some reason. Thanks, Phil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c
On 10/24/05, Phil Thompson [EMAIL PROTECTED] wrote: I'm implementing a string-like object in an extension module and trying to make it as interoperable with the standard string object as possible. To do this I'm implementing the relevant slots and the buffer interface. For most things this is fine, but there are a small number of methods in stringobject.c that don't use the buffer interface - and I don't understand why. Specifically... string_contains() doesn't which means that... MyString(foo) in foobar ...doesn't work. s.join(sequence) only allows sequence to contain string or unicode objects. s.strip([chars]) only allows chars to be a string or unicode object. Same for lstrip() and rstrip(). s.ljust(width[, fillchar]) only allows fillchar to be a string object (not even a unicode object). Same for rjust() and center(). Other methods happily allow types that support the buffer interface as well as string and unicode objects. I'm happy to submit a patch - I just wanted to make sure that this behaviour wasn't intentional for some reason. A concern I'd have with fixing this is that Unicode objects also support the buffer API. In any situation where either str or unicode is accepted I'd be reluctant to guess whether a buffer object was meant to be str-like or Unicode-like. I think this covers all the cases you mention here. We need to support this better in Python 3000; but I'm not sure you can do much better in Python 2.x; subclassing from str is unlikely to work for you because then too many places are going to assume the internal representation is also the same as for str. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Inconsistent Use of Buffer Interface instringobject.c
Guido van Rossum wrote: A concern I'd have with fixing this is that Unicode objects also support the buffer API. In any situation where either str or unicode is accepted I'd be reluctant to guess whether a buffer object was meant to be str-like or Unicode-like. I think this covers all the cases you mention here. iirc, SRE solves that by comparing the length of the sequence with the number of bytes in the buffer. if length == bytes, it's an 8-bit string; if length*sizeof(Py_Unicode) == bytes, it's a Unicode string. /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c
Guido van Rossum wrote: On 10/24/05, Phil Thompson [EMAIL PROTECTED] wrote: I'm implementing a string-like object in an extension module and trying to make it as interoperable with the standard string object as possible. To do this I'm implementing the relevant slots and the buffer interface. For most things this is fine, but there are a small number of methods in stringobject.c that don't use the buffer interface - and I don't understand why. Specifically... string_contains() doesn't which means that... MyString(foo) in foobar ...doesn't work. s.join(sequence) only allows sequence to contain string or unicode objects. s.strip([chars]) only allows chars to be a string or unicode object. Same for lstrip() and rstrip(). s.ljust(width[, fillchar]) only allows fillchar to be a string object (not even a unicode object). Same for rjust() and center(). Other methods happily allow types that support the buffer interface as well as string and unicode objects. I'm happy to submit a patch - I just wanted to make sure that this behaviour wasn't intentional for some reason. A concern I'd have with fixing this is that Unicode objects also support the buffer API. In any situation where either str or unicode is accepted I'd be reluctant to guess whether a buffer object was meant to be str-like or Unicode-like. I think this covers all the cases you mention here. This situation is a little better than that: the buffer interface has a slot called getcharbuffer which is what the string methods use in case they find that a string argument is not of type str or unicode. A few don't, but I guess we could fix this. str.split(), .[lr]strip() all support the getcharbuffer interface. str.join() currently doesn't. The Unicode object also leaves out a few cases, among those the ones you mentioned. If it's better for inter-op, I guess we should make an effort and let all of them support the getcharbuffer interface. We need to support this better in Python 3000; but I'm not sure you can do much better in Python 2.x; subclassing from str is unlikely to work for you because then too many places are going to assume the internal representation is also the same as for str. As first step, I'd suggest to implement the gatcharbuffer slot. That will already go a long way. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 24 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c
On 10/24/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: Guido van Rossum wrote: A concern I'd have with fixing this is that Unicode objects also support the buffer API. In any situation where either str or unicode is accepted I'd be reluctant to guess whether a buffer object was meant to be str-like or Unicode-like. I think this covers all the cases you mention here. This situation is a little better than that: the buffer interface has a slot called getcharbuffer which is what the string methods use in case they find that a string argument is not of type str or unicode. I stand corrected! As first step, I'd suggest to implement the gatcharbuffer slot. That will already go a long way. Phil, if anything still doesn't work after doing what Marc-Andre says, those would be good candidates for fixes! -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c
On Monday 24 October 2005 7:39 pm, Guido van Rossum wrote: On 10/24/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: Guido van Rossum wrote: A concern I'd have with fixing this is that Unicode objects also support the buffer API. In any situation where either str or unicode is accepted I'd be reluctant to guess whether a buffer object was meant to be str-like or Unicode-like. I think this covers all the cases you mention here. This situation is a little better than that: the buffer interface has a slot called getcharbuffer which is what the string methods use in case they find that a string argument is not of type str or unicode. I stand corrected! As first step, I'd suggest to implement the gatcharbuffer slot. That will already go a long way. Phil, if anything still doesn't work after doing what Marc-Andre says, those would be good candidates for fixes! I have implemented getcharbuffer - I was highlighting those methods where the getcharbuffer implementation was ignored. I'll put a patch together. Phil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Neil Hodgson wrote: For Windows, the code will get a little uglier, needing to perform an allocation/encoding and deallocation more often then at present but I don't think there will be a speed degradation as Windows is currently performing a conversion from 8 bit to UTF-16 inside many system calls. [...] For indexing UTF-16, a flag could be set to show if the string is all in the base plane and if not, an index could be constructed when and if needed. There are many design alternatives: one option would be to support *three* internal representations in a single type, generating the others from the one operation existing as needed. The default, initial representation might be UTF-8, with UCS-4 only being generated when indexing occurs, and UCS-2 only being generated when the API requires it. On concatenation, always concatenate just one represenation: either one that is already present in both operands, else UTF-8. It'd be good to get some feel for what proportion of string operations performed require indexing. Many, such as startswith, split, and concatenation don't require indexing. The proportion of operations that use indexing to scan strings would also be interesting as adding a (currentIndex, currentOffset) cursor to string objects would be another approach. Indeed. My guess is that indexing is more common than you think, especially when iterating over the string. Of course, iteration could also operate on UTF-8, if you introduced string iterator objects. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New codecs checked in
Walter Dörwald wrote: Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put a complete decoding_table into koi8_u.py? Not sure. Unfortunately, the tables being used as source are not part of the Python source, so nobody except MAL can faithfully regenerate them. If they were part of the Python source, explicitly adding one for KOI8-U would certainly be feasible. I.e. change: decoding_map.update({ 0x0080: 0x0402, # CYRILLIC CAPITAL LETTER DJE Hmm. I was suggesting to remove decoding_map completely, in which case neither the current form nor your suggested cosmetic change would survive. to decoding_table = ( u'\x00' # 0x00 - U+ NULL Using U+ in comments to denote the codepoints is a good idea, anyway. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New codecs checked in
M.-A. Lemburg wrote: I just left them in because I thought they wouldn't do any harm and might be useful in some applications. Removing them where not directly needed by the codec would not be a problem. I think memory usage caused is measurable (I estimated 4KiB per dictionary). More importantly, people apparently currently change the dictionaries we provide and expect the codecs to automatically pick up the modified mappings. It would be better if the breakage is explicit (i.e. they get an AttributeError on the variable) instead of implicit (their changes to the mapping simply have no effect anymore). KOI8-U is not available as mapping on ftp.unicode.org and I only recreated codecs from the mapping files available there. I think we should come up with mapping tables for the additional codecs as well, and maintain them in the CVS. This also applies to things like rot13. I'll rerun the creation with the above changes sometime this week. I hope I can finish my encoding routine shortly, which again results in changes to the codecs (replacing the encoding dictionaries with other lookup tables). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New codecs checked in
M.-A. Lemburg wrote: I had to create three custom mapping files for cp1140, koi8-u and tis-620. Can you please publish the files you have used somewhere? They best go into the Python CVS. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
There are many design alternatives: one option would be to support *three* internal representations in a single type, generating the others from the one operation existing as needed. The default, initial representation might be UTF-8, with UCS-4 only being generated when indexing occurs, and UCS-2 only being generated when the API requires it. On concatenation, always concatenate just one represenation: either one that is already present in both operands, else UTF-8. Wouldn't it be simpler to use: - one-byte representation if every character = 0xFF - two-byte representation if every character = 0x - four-byte representation otherwise Then combining several strings means using the larger representation as a result (*). In practice, most use cases will not involve the four-byte representation. (*) a heuristic can be invented so that, when producing a smaller string (by stripping/slicing/etc.), it will sometimes check whether a narrower representation is possible. For example : store the length of the string when the last check occurred, and do a new check when the length falls below the half that value. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
On 10/24/05, Martin v. Löwis [EMAIL PROTECTED] wrote: Indeed. My guess is that indexing is more common than you think, especially when iterating over the string. Of course, iteration could also operate on UTF-8, if you introduced string iterator objects. Python's slice-and-dice model pretty much ensures that indexing is common. Almost everything is ultimately represented as indices: regex search results have the index in the API, find()/index() return indices, many operations take a start and/or end index. As long as that's the case, indexing better be fast. Changing the APIs would be much work, although perhaps not impossible of Python 3000. For example, Raymond Hettinger's partition() API doesn't refer to indices at all, and can replace many uses of find() or index(). Still, the mere existence of __getitem__ and __getslice__ on strings makes it necessary to implement them efficiently. How realistic would it be to drop them? What should replace them? Some kind of abstract pointers-into-strings perhaps, but that seems much more complex. The trick seems to be to support both simple programs manipulating short strings (where indexing is probably the easiest API to understand, and the additional copying is unlikely to cause performance problems) , as well as programs manipulating very large buffers containing text and doing sophisticated string processing on them. Perhaps we could provide a different kind of API to support the latter, perhaps based on a mutable character buffer data type without direct indexing? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
On 10/24/05, Martin v. Löwis [EMAIL PROTECTED] wrote: Guido van Rossum wrote: Changing the APIs would be much work, although perhaps not impossible of Python 3000. For example, Raymond Hettinger's partition() API doesn't refer to indices at all, and can replace many uses of find() or index(). I think Neil's proposal is not to make them go away, but to implement them less efficiently. For example, if the internal representation is UTF-8, indexing requires linear time, as opposed to constant time. If the internal representation is UTF-16, and you have a flag to indicate whether there are any surrogates on the string, indexing is constant if the flag is false, else linear. I understand all that. My point is that it's a bad idea to offer an indexing operation that isn't O(1). Perhaps we could provide a different kind of API to support the latter, perhaps based on a mutable character buffer data type without direct indexing? There are different design goals conflicting here: - some think: all my data is ASCII, so I want to only use one byte per character. - others think: all my data goes to the Windows API, so I want to use 2 byte per character. - yet others think: I want all of Unicode, with proper, efficient indexing, so I want four bytes per char. I doubt the last one though. Probably they really don't want efficient indexing, they want to perform higher-level operations that currently are only possible using efficient indexing or slicing. With the right API. perhaps they could work just as efficiently with an internal representation of UTF-8. It's not so much a matter of API as a matter of internal representation. The API doesn't have to change (except for the very low-level C API that directly exposes Py_UNICODE*, perhaps). I think the API should reflect the representation *to some extend*, namely it shouldn't claim to have operations that are typically thought of as O(1) that can only be implemented as O(n). An internal representation of UTF-8 might make everyone happy except heavy Windows users; but it requires changes to the API so people won't be writing Python 2.x-style string slinging code. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
M.-A. Lemburg: Unicode has the concept of combining code points, e.g. you can store an é (e with a accent) as e + '. Now if you slice off the accent, you'll break the character that you encoded using combining code points. ... next_indextype(u, index) - integer Returns the Unicode object index for the start of the next indextype found after u[index] or -1 in case no next element of this type exists. Should entity breakage be further discouraged by returning a slice here rather than an object index? Something like: i = first_grapheme(u) x = 0 while x width and u[i] != \n: x, _ = draw(u[i], (x, y)) i = next_grapheme(u, i) Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
- yet others think: I want all of Unicode, with proper, efficient indexing, so I want four bytes per char. I doubt the last one though. Probably they really don't want efficient indexing, they want to perform higher-level operations that currently are only possible using efficient indexing or slicing. With the right API. perhaps they could work just as efficiently with an internal representation of UTF-8. I just got mail this morning from a researcher who wants exactly what Martin described, and wondered why the default MacPython 2.4.2 didn't provide it by default. :-) Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
On 10/24/05, Bill Janssen [EMAIL PROTECTED] wrote: - yet others think: I want all of Unicode, with proper, efficient indexing, so I want four bytes per char. I doubt the last one though. Probably they really don't want efficient indexing, they want to perform higher-level operations that currently are only possible using efficient indexing or slicing. With the right API. perhaps they could work just as efficiently with an internal representation of UTF-8. I just got mail this morning from a researcher who wants exactly what Martin described, and wondered why the default MacPython 2.4.2 didn't provide it by default. :-) Oh, I don't doubt that they want it. But often they don't *need* it, and the higher-level goal they are trying to accomplish can be dealt with better in a different way. (Sort of my response to people asking for static typing in Python as well. :-) Did they tell you what they were trying to do that MacPython 2.4.2 wouldn't let them, beyond represent a large Unicode string as an array of 4-byte integers? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Guido van Rossum wrote: I think the API should reflect the representation *to some extend*, namely it shouldn't claim to have operations that are typically thought of as O(1) that can only be implemented as O(n). Maybe a compromise could be reached by using a btree of chunks or something, so indexing is O(log n). Not as good as O(1) but a lot better than O(n). -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Guido van Rossum wrote: Python's slice-and-dice model pretty much ensures that indexing is common. Almost everything is ultimately represented as indices: regex search results have the index in the API, find()/index() return indices, many operations take a start and/or end index. Maybe the idea of string views should be reconsidered in light of this. It's been criticised on the grounds that its use could keep large strings alive longer than needed, but if operations that currently return indices instead returned string views, this wouldn't be any more of a concern than it is now, especially if there is an easy way to explicitly materialise the view as an independent string when wanted. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Guido writes: Oh, I don't doubt that they want it. But often they don't *need* it, and the higher-level goal they are trying to accomplish can be dealt with better in a different way. (Sort of my response to people asking for static typing in Python as well. :-) I suppose that's true. But what if they're not smart enough to figure out that better, different, way? I doubt you intend Python to be sort of the Rubik's cube of programming... And no, he didn't say why he wanted the ability to represent a Unicode string as an array of 4-byte integers. Though I know he's doing something with the Deseret Alphabet, translating some early work on American Indian culture that was transcribed in that character set. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] AST branch is in?
On Fri, 21 Oct 2005 18:32:22 + (UTC) nas at arctrix.com (Neil Schemenauer) wrote: Does it just allow us to do new and interesting manipulations of the code during compilation? Well, that's a pretty big deal, IMHO. For example, adding pychecker-like functionality should be straight forward now. I also hope some of the namespace optimizations get explored (e.g. PEP 267). Is there a python interface ? Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com