date:20051024

Neil Hodgson wrote:
I'd like to more tightly define Unicode strings for Python 3000.
 Currently, Unicode strings may be implemented with either 2 byte
 (UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to
 contain any Unicode character and should be indexable yielding
 characters rather than half characters. Therefore Python strings
 should appear to be UTF-32. There could still be multiple
 implementations (using UTF-16 or UTF-8) to preserve space but all
 implementations should appear to be the same apart from speed and
 memory use.

That's very tricky. If you have multiple implementations, you make
usage at the C API difficult. If you make it either UTF-8 or UTF-32,
you make PythonWin difficult. If you make it UTF-16, you make indexing
difficult.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

Phillip J. Eby wrote:
 I'm tempted to say it would be even better if there was a command line 
 option that could be used to force all binary opens to result in bytes, and 
 require all text opens to specify an encoding.

For Python 3000? -1. There shouldn't be command line switches that have
that much importance.

For Python 2.x? Well, we are not supposed to discuss this.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Neil Hodgson

Martin v. Löwis:

 That's very tricky. If you have multiple implementations, you make
 usage at the C API difficult. If you make it either UTF-8 or UTF-32,
 you make PythonWin difficult. If you make it UTF-16, you make indexing
 difficult.

   For Windows, the code will get a little uglier, needing to perform
an allocation/encoding and deallocation more often then at present but
I don't think there will be a speed degradation as Windows is
currently performing a conversion from 8 bit to UTF-16 inside many
system calls. To minimize the cost of allocation, Python could copy
Windows in keeping a small number of commonly sized preallocated
buffers handy.

   For indexing UTF-16, a flag could be set to show if the string is
all in the base plane and if not, an index could be constructed when
and if needed. It'd be good to get some feel for what proportion of
string operations performed require indexing. Many, such as
startswith, split, and concatenation don't require indexing. The
proportion of operations that use indexing to scan strings would also
be interesting as adding a (currentIndex, currentOffset) cursor to
string objects would be another approach.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Michele Simionato

On 10/23/05, Nick Coghlan [EMAIL PROTECTED] wrote:
 Very nice indeed. I'd be more supportive if it was defined as a new statement
 such as create with the syntax:

create TYPE NAME(ARGS):
  BLOCK

I like it, but it would require a new keyword. Alternatively, one
could abuse 'def':

def  TYPE NAME(ARGS):
  BLOCK

but then people would likely be confused as Skip was, earlier in this thread,
so I guess 'def' is a not an option.

IMHO a new keyword could be justified for such a powerful feature,
but only Guido's opinion counts on this matters ;)

Anyway I expected people to criticize the proposal as too powerful and
dangerously close to Lisp macros.

 Michele Simionato
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] int(string)

2005-10-24 Thread Fredrik Lundh

Alan McIntyre wrote:

 When running make test I get some errors in test_array and
 test_compile that did not occur in the build from CVS.  Given the inputs
 to long() have '.' characters in them, I assume that these tests really
 should be failing as implemented, but I haven't dug into them to see
 what's going on:

 ==
 ERROR: test_repr (__main__.FloatTest)
 --
 Traceback (most recent call last):
   File Lib/test/test_array.py, line 187, in test_repr
 self.assertEqual(a, eval(repr(a), {array: array.array}))
 ValueError: invalid literal for long(): 100.0

 ==
 ERROR: test_repr (__main__.DoubleTest)
 --
 Traceback (most recent call last):
   File Lib/test/test_array.py, line 187, in test_repr
 self.assertEqual(a, eval(repr(a), {array: array.array}))
 ValueError: invalid literal for long(): 100.0

I don't have the latest cvs, but in my copy of test_array, the input to those
two eval calls are

 array('f', [-42.0, 0.0, 42.0, 10.0, -100.0, -42.0, 0.0, 42.0,
10.0, -100.0])

and

 array('d', [-42.0, 0.0, 42.0, 10.0, -100.0, -42.0, 0.0, 42.0,
10.0, -100.0])

respectively.  if either of those gives invalid literal for long, something's
seriously broken.

does a plain

a = -100.0

still work on your machine?

/F



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?


Michele Simionato [EMAIL PROTECTED] wrote:
 
 On 10/23/05, Nick Coghlan [EMAIL PROTECTED] wrote:
  Very nice indeed. I'd be more supportive if it was defined as a new 
  statement
  such as create with the syntax:
 
 create TYPE NAME(ARGS):
   BLOCK
 
 I like it, but it would require a new keyword. Alternatively, one
 could abuse 'def':
 
 def  TYPE NAME(ARGS):
   BLOCK
 
 but then people would likely be confused as Skip was, earlier in this thread,
 so I guess 'def' is a not an option.
 
 IMHO a new keyword could be justified for such a powerful feature,
 but only Guido's opinion counts on this matters ;)
 
 Anyway I expected people to criticize the proposal as too powerful and
 dangerously close to Lisp macros.

I would criticise it for being dangerously close to worthless.  With the
minor support code that I (and others) have offered, no new syntax is
necessary.

You can get the same semantics with...

class NAME(_(TYPE), ARGS):
BLOCK

And a suitably defined _.  Remember, not every X line function should be
made a builtin or syntax.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Michele Simionato

On 10/24/05, Josiah Carlson [EMAIL PROTECTED] wrote:
 I would criticise it for being dangerously close to worthless.  With the
 minor support code that I (and others) have offered, no new syntax is
 necessary.

 You can get the same semantics with...

 class NAME(_(TYPE), ARGS):
 BLOCK

 And a suitably defined _.  Remember, not every X line function should be
 made a builtin or syntax.

  - Josiah

Could you re-read my original message, please? Sugar is *everything*
in this case. If the functionality is to be implemented via a __metaclass__
hook, then it should be considered a hack that nobody in his right mind
should use. OTOH, if there is a specific syntax for it, then it means
this the usage
has the benediction of the BDFL. This would be a HUGE change.
For instance, I would never abuse metaclasses for that, whereas I
would freely use a 'create' statement.

   Michele Simionato
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

Neil Hodgson wrote:
Guido van Rossum:

Folks, please focus on what Python 3000 should do.

I'm thinking about making all character strings Unicode (possibly with
different internal representations a la NSString in Apple's Objective
C) and introduce a separate mutable bytes array data type. But I could
use some validation or feedback on this idea from actual
practitioners.

I'd like to more tightly define Unicode strings for Python 3000.
Currently, Unicode strings may be implemented with either 2 byte
(UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to
contain any Unicode character and should be indexable yielding
characters rather than half characters. Therefore Python strings
should appear to be UTF-32. There could still be multiple
implementations (using UTF-16 or UTF-8) to preserve space but all
implementations should appear to be the same apart from speed and
memory use.

There seems to be a general misunderstanding here: even if you
have UCS4 storage, it is still possible to slice a Unicode
string in a way which makes rendering it correctly.

Unicode has the concept of combining code points, e.g. you can
store an é (e with a accent) as e + '. Now if you slice
off the accent, you'll break the character that you encoded
using combining code points.

Note that combining code points are rather common in encodings
of Asian scripts, so this is not an artificial example.

Some time ago I proposed a new module called unicodeindex
to help with indexing. It would solve most of the indexing
issues you run into when dealing with Unicode. I've attached
it to this email for reference.

Re: [Python-Dev] New codecs checked in

2005-10-24 Thread Walter Dörwald

Martin v. Löwis wrote:

 M.-A. Lemburg wrote:
 
I've checked in a whole bunch of newly generated codecs
which now make use of the faster charmap decoding variant added
by Walter a short while ago.

Please let me know if you find any problems.
 
 I think we should work on eliminating the decoding_map variables.
 There are some codecs which rely on them being present in other codecs
 (e.g. koi8_u.py is based on koi8_r.py); however, this could be updated
 to use, say
 
 decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, {
  0x00a4: 0x0454, #   CYRILLIC SMALL LETTER UKRAINIAN IE
  0x00a6: 0x0456, #   CYRILLIC SMALL LETTER 
 BYELORUSSIAN-UKRAINIAN I
  0x00a7: 0x0457, #   CYRILLIC SMALL LETTER YI (UKRAINIAN)
  0x00ad: 0x0491, #   CYRILLIC SMALL LETTER UKRAINIAN GHE 
 WITH UPTURN
  0x00b4: 0x0404, #   CYRILLIC CAPITAL LETTER UKRAINIAN IE
  0x00b6: 0x0406, #   CYRILLIC CAPITAL LETTER 
 BYELORUSSIAN-UKRAINIAN I
  0x00b7: 0x0407, #   CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
  0x00bd: 0x0490, #   CYRILLIC CAPITAL LETTER UKRAINIAN GHE 
 WITH UPTURN
 })
 
 With all these cross-references gone, the decoding_maps could also go.

Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put 
a complete decoding_table into koi8_u.py?

I'd like to suggest a small cosmetic change: gencodec.py should output 
byte values with two hexdigits instead of four. This makes it easier to 
see what is a byte values and what is a codepoint. And it would make 
grepping for stuff simpler.

I.e. change:

decoding_map.update({
 0x0080: 0x0402, #  CYRILLIC CAPITAL LETTER DJE

to

decoding_map.update({
 0x80: 0x0402, #  CYRILLIC CAPITAL LETTER DJE

and

decoding_table = (
 u'\x00' #  0x - NULL

to

decoding_table = (
 u'\x00' # 0x00 - U+ NULL

and

encoding_map = {
 0x: 0x, #  NULL

to

encoding_map = {
 0x: 0x00, #  NULL
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

Walter Dörwald wrote:
 Martin v. Löwis wrote:
 
 M.-A. Lemburg wrote:

 I've checked in a whole bunch of newly generated codecs
 which now make use of the faster charmap decoding variant added
 by Walter a short while ago.

 Please let me know if you find any problems.


 I think we should work on eliminating the decoding_map variables.
 There are some codecs which rely on them being present in other codecs
 (e.g. koi8_u.py is based on koi8_r.py); however, this could be updated
 to use, say

 decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, {
  0x00a4: 0x0454, #   CYRILLIC SMALL LETTER UKRAINIAN IE
  0x00a6: 0x0456, #   CYRILLIC SMALL LETTER
 BYELORUSSIAN-UKRAINIAN I
  0x00a7: 0x0457, #   CYRILLIC SMALL LETTER YI (UKRAINIAN)
  0x00ad: 0x0491, #   CYRILLIC SMALL LETTER UKRAINIAN GHE
 WITH UPTURN
  0x00b4: 0x0404, #   CYRILLIC CAPITAL LETTER UKRAINIAN IE
  0x00b6: 0x0406, #   CYRILLIC CAPITAL LETTER
 BYELORUSSIAN-UKRAINIAN I
  0x00b7: 0x0407, #   CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
  0x00bd: 0x0490, #   CYRILLIC CAPITAL LETTER UKRAINIAN GHE
 WITH UPTURN
 })

 With all these cross-references gone, the decoding_maps could also go.

I just left them in because I thought they wouldn't do any harm
and might be useful in some applications.

Removing them where not directly needed by the codec would not
be a problem.

 Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
 a complete decoding_table into koi8_u.py?

KOI8-U is not available as mapping on ftp.unicode.org and
I only recreated codecs from the mapping files available
there.

 I'd like to suggest a small cosmetic change: gencodec.py should output
 byte values with two hexdigits instead of four. This makes it easier to
 see what is a byte values and what is a codepoint. And it would make
 grepping for stuff simpler.

True.

I'll rerun the creation with the above changes sometime this
week.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Nick Coghlan

Josiah Carlson wrote:
 You can get the same semantics with...
 
 class NAME(_(TYPE), ARGS):
 BLOCK
 
 And a suitably defined _.  Remember, not every X line function should be
 made a builtin or syntax.

And this would be an extremely fragile hack that is entirely dependent on the 
murky rules regarding how Python chooses the metaclass for the newly created 
class. Ensuring that the metaclass of the class returned by _ was always the 
one chosen would be tricky at best and impossible at worst.

Even if it *could* be done, I'd never want to see a hack like that in 
production code I had anything to do with.

And while writing it with __metaclass__ has precisely the correct semantics, 
that simply isn't as readable as a new block statement would be, nor is it as 
readable as the current major alternatives (e.g., defining and invoking a 
factory function).

An alternative to a completely new function would be to simply allow the 
metaclass to be defined up front, rather than inside the body of the class 
statement:

   class @TYPE NAME(ARGS):
   BLOCK

For example:

   class @Property x():
   def get(self):
   return self._x
   def set(self, value):
   self._x = value
   def delete(self, value):
   del self._x

(I put the metaclass after the keyword, because, unlike a function decorator, 
the metaclass is invoked *before* the class is created, and because you're 
only allowed one explicit metaclass)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

Bengt Richter wrote:
 Please bear with me for a few paragraphs ;-)

Please note that source code encoding doesn't really have
anything to do with the way the interpreter executes the
program - it's merely a way to tell the parser how to
convert string literals (currently on the Unicode ones)
into constant Unicode objects within the program text.
It's also a nice way to let other people know what kind of
encoding you used to write your comments ;-)

Nothing more.

Once a module is compiled, there's no distinction between
a module using the latin-1 source code encoding or one using
the utf-8 encoding.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Nick Coghlan

Barry Warsaw wrote:
 I've had this PEP laying around for quite a few months.  It was inspired
 by some code we'd written which wanted to be able to get immutable
 versions of arbitrary objects.  I've finally finished the PEP, uploaded
 a sample patch (albeit a bit incomplete), and I'm posting it here to see
 if there is any interest.
 
 http://www.python.org/peps/pep-0351.html

I think it's definitely worth considering. It may also reduce the need for x 
and frozenx builtin pairs. We already have set and frozenset, and the 
various bytes ideas that have been kicked around have generally considered 
the need for a frozenbytes as well.

If freeze was available, then freeze(x(*args)) might server as a replacement 
for any builtin frozen variants.

I think having dicts and sets automatically invoke freeze would be a mistake, 
because at least one of the following two cases would behave unexpectedly:

   d = {}
   l = []
   d[l] = Oops!
   d[l] # Raises KeyError if freeze() isn't also invoked in __getitem__

   d = {}
   l = []
   d[l] = Oops!
   l.append(1)
   d[l] # Raises KeyError regardless

Oh, and the PEP's xdict example is even more broken than the PEP implies, 
because two imdicts which compare equal (same contents) may not hash equal 
(different id's).

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Christopher Armstrong

On 10/24/05, Josiah Carlson [EMAIL PROTECTED] wrote:
 Should dicts and sets automatically freeze their mutable keys?

 Dictionaries don't have mutable keys,

Since when?

class Foo:
def __init__(self):
self.x = 1

f = Foo()
d = {f: 1}
f.x = 2

Maybe you meant something else? I can't think of any way in which
dictionaries don't have mutable keys is true. The only rule about
dictionary keys that I know of is that they need to be hashable and
need to be comparable with the equality operator.

--
  Twisted   |  Christopher Armstrong: International Man of Twistery
   Radix|-- http://radix.twistedmatrix.com
|  Release Manager, Twisted Project
  \\\V///   |-- http://twistedmatrix.com
   |o O||
wvw-+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol


Christopher Armstrong [EMAIL PROTECTED] wrote:
 
 On 10/24/05, Josiah Carlson [EMAIL PROTECTED] wrote:
  Should dicts and sets automatically freeze their mutable keys?
 
  Dictionaries don't have mutable keys,
 
 Since when?
 
 Maybe you meant something else? I can't think of any way in which
 dictionaries don't have mutable keys is true. The only rule about
 dictionary keys that I know of is that they need to be hashable and
 need to be comparable with the equality operator.

Good point, I forgot about user-defined classes (I rarely use them as
keys myself, it's all too easy to make a mutable whose hash is dependant
on mutable contents, as having an object which you can only find if you
have the exact object is not quite as useful I generally need).  I will,
however, stand by, a container which is frozen should have its contents
frozen as well.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

2005-10-24 Thread Walter Dörwald

M.-A. Lemburg wrote:

 Walter Dörwald wrote:
 
Martin v. Löwis wrote:

M.-A. Lemburg wrote:

I've checked in a whole bunch of newly generated codecs
which now make use of the faster charmap decoding variant added
by Walter a short while ago.

Please let me know if you find any problems.

I think we should work on eliminating the decoding_map variables.
There are some codecs which rely on them being present in other codecs
(e.g. koi8_u.py is based on koi8_r.py); however, this could be updated
to use, say

decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, {
 0x00a4: 0x0454, #   CYRILLIC SMALL LETTER UKRAINIAN IE
 0x00a6: 0x0456, #   CYRILLIC SMALL LETTER
BYELORUSSIAN-UKRAINIAN I
 0x00a7: 0x0457, #   CYRILLIC SMALL LETTER YI (UKRAINIAN)
 0x00ad: 0x0491, #   CYRILLIC SMALL LETTER UKRAINIAN GHE
WITH UPTURN
 0x00b4: 0x0404, #   CYRILLIC CAPITAL LETTER UKRAINIAN IE
 0x00b6: 0x0406, #   CYRILLIC CAPITAL LETTER
BYELORUSSIAN-UKRAINIAN I
 0x00b7: 0x0407, #   CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
 0x00bd: 0x0490, #   CYRILLIC CAPITAL LETTER UKRAINIAN GHE
WITH UPTURN
})

With all these cross-references gone, the decoding_maps could also go.
 
 I just left them in because I thought they wouldn't do any harm
 and might be useful in some applications.
 
 Removing them where not directly needed by the codec would not
 be a problem.

Recreating them is quite simple via dict(enumerate(decoding_table)) so I 
think we should remove them.

Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
a complete decoding_table into koi8_u.py?
 
 KOI8-U is not available as mapping on ftp.unicode.org and
 I only recreated codecs from the mapping files available
 there.

OK, so we'd need something that creates a new decoding table from an old 
one + changes, i.e. something like:

def update_decoding_table(table, new):
table = list[table]
for (key, value) in new.iteritems():
   table[key] = unichr(value)
return u.join(table)

I'd like to suggest a small cosmetic change: gencodec.py should output
byte values with two hexdigits instead of four. This makes it easier to
see what is a byte values and what is a codepoint. And it would make
grepping for stuff simpler.
 
 True.
 
 I'll rerun the creation with the above changes sometime this
 week.

Great, thanks!

Bye,
Walter Dörwald
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Michael Hudson

Nick Coghlan [EMAIL PROTECTED] writes:

 Josiah Carlson wrote:
 You can get the same semantics with...
 
 class NAME(_(TYPE), ARGS):
 BLOCK
 
 And a suitably defined _.  Remember, not every X line function should be
 made a builtin or syntax.

 And this would be an extremely fragile hack that is entirely
 dependent on the murky rules regarding how Python chooses the
 metaclass for the newly created class.

Uh, not really.  In the presence of base classes it's always the type
of the first base.  The reason it might not seem this simple is that
most metaclasses end up calling type.__new__ at some point and this
function does more complicated things (such as checking for metaclass
conflict and deferring to the most specific metaclass).  

Not sure what the context is here, but I have to butt in when I see
people complicating things which aren't actually that complicated...

Cheers,
mwh

-- 
  There's an aura of unholy black magic about CLISP.  It works, but
  I have no idea how it does it.  I suspect there's a goat involved
  somewhere. -- Johann Hibschman, comp.lang.scheme
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol


Nick Coghlan [EMAIL PROTECTED] wrote:
 I think having dicts and sets automatically invoke freeze would be a mistake, 
 because at least one of the following two cases would behave unexpectedly:

I'm pretty sure that the PEP was only aslomg if one would freeze the
contents of dicts IF the dict was being frozen.

That is, which of the following should be the case:
freeze({1:[2,3,4]}) - {1:[2,3,4]}
freeze({1:[2,3,4]}) - xdict(1=(2,3,4))

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?


Michele Simionato [EMAIL PROTECTED] wrote:
 
 On 10/24/05, Josiah Carlson [EMAIL PROTECTED] wrote:
  I would criticise it for being dangerously close to worthless.  With the
  minor support code that I (and others) have offered, no new syntax is
  necessary.
 
  You can get the same semantics with...
 
  class NAME(_(TYPE), ARGS):
  BLOCK
 
  And a suitably defined _.  Remember, not every X line function should be
  made a builtin or syntax.
 
   - Josiah
 
 Could you re-read my original message, please? Sugar is *everything*
 in this case. If the functionality is to be implemented via a __metaclass__
 hook, then it should be considered a hack that nobody in his right mind
 should use. OTOH, if there is a specific syntax for it, then it means
 this the usage
 has the benediction of the BDFL. This would be a HUGE change.
 For instance, I would never abuse metaclasses for that, whereas I
 would freely use a 'create' statement.

Metaclass abuse?  Oh, I'm sorry, I thought that the point of metaclasses
were to offer a way to make magic happen in a somewhat pragmatic
manner, you know, through metaprogramming.  I would call this particular
use a practical application of standard Python semantics.

Pardon me while I attempt to re-parse your above statement...
If there is a specific syntax for [passing a temporary namespace to a
callable, created by some sort of block mechanism], then [using it for
property creation] has the benediction of the BDFL.

What I'm trying to say is that it already has a no-syntax syntax.  It
uses the magic of metaclasses, but one can make that magic as
explicit as necessary.

class NAME(PassNamespaceFromClassBlock(fcn=TYPE, args=ARGS)):
BLOCK


Personally, I've not seen the desire to pass temporary namespaces to
functions until recently, so whether or not people will use it for
property creation, or any other way that people would find interesting
and/or useful, is at least a bit of prediction.  Maybe people will
prefer to use property('get_foo', 'set_foo', 'del_foo'), who knows?  But
you know what?  Regardless of what people want, they can use metaclasses
right now to create properties, where they would have to wait until
Python 2.5 comes out before they could use this proposed 'create'
statement.


 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Ronald Oussoren


On 24-okt-2005, at 12:54, Josiah Carlson wrote:




 Metaclass abuse?  Oh, I'm sorry, I thought that the point of  
 metaclasses
 were to offer a way to make magic happen in a somewhat pragmatic
 manner, you know, through metaprogramming.  I would call this  
 particular
 use a practical application of standard Python semantics.


I'd say using a class statement to define a property is metaclass  
abuse, as would
anything that wouldn't define something class-like. The same is true  
for other
constructs, using an decorator to define something that is not a  
callable would IMHO
also be abuse.

That said, I really have an opinion on the 'create' statement  
proposal yet. It
does seem to have a very limited field of use. I'm quite happy with  
using property
as it is, property('get_foo', 'set_foo') would take away most if not  
all of
the remaining problems.

Ronald

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] KOI8_U (New codecs checked in)

Walter Dörwald wrote:
 Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
 a complete decoding_table into koi8_u.py?


 KOI8-U is not available as mapping on ftp.unicode.org and
 I only recreated codecs from the mapping files available
 there.
 
 
 OK, so we'd need something that creates a new decoding table from an old
 one + changes, i.e. something like:
 
 def update_decoding_table(table, new):
table = list[table]
for (key, value) in new.iteritems():
   table[key] = unichr(value)
return u.join(table)

Actually, I'd rather have some official mapping files
for these.

Perhaps we could get someone to upload a mapping file
for KOI8_U to the Unicode site ?!

The mapping is defined in RFC2319:

http://www.faqs.org/rfcs/rfc2319.html

I've put Alexander Yeremenko, the coordinator of
the KOI8-U group on CC.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] KOI8_U (New codecs checked in)



M.-A. Lemburg wrote:
 Walter Dörwald wrote:
 
Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put
a complete decoding_table into koi8_u.py?


KOI8-U is not available as mapping on ftp.unicode.org and
I only recreated codecs from the mapping files available
there.


OK, so we'd need something that creates a new decoding table from an old
one + changes, i.e. something like:

def update_decoding_table(table, new):
   table = list[table]
   for (key, value) in new.iteritems():
  table[key] = unichr(value)
   return u.join(table)
 
 
 Actually, I'd rather have some official mapping files
 for these.
 
 Perhaps we could get someone to upload a mapping file
 for KOI8_U to the Unicode site ?!
 
 The mapping is defined in RFC2319:
 
 http://www.faqs.org/rfcs/rfc2319.html
 
 I've put Alexander Yeremenko, the coordinator of
 the KOI8-U group on CC.

Hmm, that email address bounces. I've now put Maxim
on CC: Maxim Dzumanenko [EMAIL PROTECTED]

Here's a mapping file for KOI9-U - please check whether
it's correct.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
#
#   Name: KOI8-U (RFC2319) to Unicode
#
#   See RFC2319 for details. This encoding is a modified KOI8-R
#   encoding.
#
0x000x  #   NULL
0x010x0001  #   START OF HEADING
0x020x0002  #   START OF TEXT
0x030x0003  #   END OF TEXT
0x040x0004  #   END OF TRANSMISSION
0x050x0005  #   ENQUIRY
0x060x0006  #   ACKNOWLEDGE
0x070x0007  #   BELL
0x080x0008  #   BACKSPACE
0x090x0009  #   HORIZONTAL TABULATION
0x0A0x000A  #   LINE FEED
0x0B0x000B  #   VERTICAL TABULATION
0x0C0x000C  #   FORM FEED
0x0D0x000D  #   CARRIAGE RETURN
0x0E0x000E  #   SHIFT OUT
0x0F0x000F  #   SHIFT IN
0x100x0010  #   DATA LINK ESCAPE
0x110x0011  #   DEVICE CONTROL ONE
0x120x0012  #   DEVICE CONTROL TWO
0x130x0013  #   DEVICE CONTROL THREE
0x140x0014  #   DEVICE CONTROL FOUR
0x150x0015  #   NEGATIVE ACKNOWLEDGE
0x160x0016  #   SYNCHRONOUS IDLE
0x170x0017  #   END OF TRANSMISSION BLOCK
0x180x0018  #   CANCEL
0x190x0019  #   END OF MEDIUM
0x1A0x001A  #   SUBSTITUTE
0x1B0x001B  #   ESCAPE
0x1C0x001C  #   FILE SEPARATOR
0x1D0x001D  #   GROUP SEPARATOR
0x1E0x001E  #   RECORD SEPARATOR
0x1F0x001F  #   UNIT SEPARATOR
0x200x0020  #   SPACE
0x210x0021  #   EXCLAMATION MARK
0x220x0022  #   QUOTATION MARK
0x230x0023  #   NUMBER SIGN
0x240x0024  #   DOLLAR SIGN
0x250x0025  #   PERCENT SIGN
0x260x0026  #   AMPERSAND
0x270x0027  #   APOSTROPHE
0x280x0028  #   LEFT PARENTHESIS
0x290x0029  #   RIGHT PARENTHESIS
0x2A0x002A  #   ASTERISK
0x2B0x002B  #   PLUS SIGN
0x2C0x002C  #   COMMA
0x2D0x002D  #   HYPHEN-MINUS
0x2E0x002E  #   FULL STOP
0x2F0x002F  #   SOLIDUS
0x300x0030  #   DIGIT ZERO
0x310x0031  #   DIGIT ONE
0x320x0032  #   DIGIT TWO
0x330x0033  #   DIGIT THREE
0x340x0034  #   DIGIT FOUR
0x350x0035  #   DIGIT FIVE
0x360x0036  #   DIGIT SIX
0x370x0037  #   DIGIT SEVEN
0x380x0038  #   DIGIT EIGHT
0x390x0039  #   DIGIT NINE
0x3A0x003A  #   COLON
0x3B0x003B  #   SEMICOLON
0x3C0x003C  #   LESS-THAN SIGN
0x3D0x003D  #   EQUALS SIGN
0x3E0x003E  #   GREATER-THAN SIGN
0x3F0x003F  #   QUESTION MARK
0x400x0040  #   COMMERCIAL AT
0x410x0041  #   LATIN CAPITAL LETTER A
0x420x0042  #   LATIN CAPITAL LETTER B
0x430x0043  #   LATIN CAPITAL LETTER C
0x440x0044  #   LATIN CAPITAL LETTER D
0x450x0045  #   LATIN CAPITAL LETTER E
0x460x0046  #   LATIN CAPITAL LETTER F
0x470x0047  #   LATIN CAPITAL LETTER G
0x480x0048  #   LATIN CAPITAL LETTER H
0x490x0049  #   LATIN CAPITAL LETTER I
0x4A0x004A  #   LATIN CAPITAL LETTER J
0x4B0x004B  #   LATIN CAPITAL LETTER K
0x4C0x004C  #   LATIN CAPITAL LETTER L
0x4D0x004D  #   LATIN CAPITAL LETTER M
0x4E0x004E  #   LATIN CAPITAL LETTER N
0x4F0x004F  #   LATIN CAPITAL LETTER O
0x500x0050  #   LATIN CAPITAL LETTER P
0x510x0051  #   LATIN CAPITAL LETTER Q
0x520x0052  #   LATIN CAPITAL LETTER R
0x530x0053  #   LATIN CAPITAL LETTER S
0x540x0054  #   LATIN

Re: [Python-Dev] Definining properties - a use case for class decorators?

2005-10-24 Thread Michele Simionato

On 10/24/05, Ronald Oussoren [EMAIL PROTECTED] wrote:
 I'd say using a class statement to define a property is metaclass
 abuse, as would
 anything that wouldn't define something class-like. The same is true
 for other
 constructs, using an decorator to define something that is not a
 callable would IMHO
 also be abuse.

+1

 That said, I really have an opinion on the 'create' statement
 proposal yet. It
 does seem to have a very limited field of use.

This is definitely non-true. The 'create' statement would have lots of
applications. On top of my mind I can think of 'create' applied to:

- bunches;
- modules;
- interfaces;
- properties;
- usage in framewors, for instance providing sugar for
Object-Relational mappers,
  for making templates (i.e. a create HTMLPage);
- building custom minilanguages;
- ...

This is way I see a 'create' statement is frightening powerful addition to the
language.

 Michele Simionato
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-24 Thread Nick Coghlan

Guido van Rossum wrote:
 Right. That was my point. Nick's worried about undecorated __context__
 because he wants to endow generators with a different default
 __context__. I say no to both proposals and the worries cancel each
 other out. EIBTI.

Works for me.

That makes the resolutions for the posted issues:

1. The slot name __context__ will be used instead of __with__
2. The builtin name context is currently offlimits due to its ambiguity
3a. generator-iterators do NOT have a native context
3b. Use contextmanager as a builtin decorator to get generator-contexts
4. The __context__ slot will NOT be special cased

I'll add those into the PEP and reference this thread after Martin is done 
with the SVN migration.

However, those resolutions bring up the following issues:

   5 a. What exception is raised when EXPR does not have a __context__ method?
 b.  What about when the returned object is missing __enter__ or __exit__?
I suggest raising TypeError in both cases, for symmetry with for loops.
The slot check is made in C code, so I don't see any difficulty in raising
TypeError instead of AttributeError if the relevant slots aren't filled.

   6 a. Should a generic closing context manager be provided?
 b. If yes, should it be a builtin or in a contexttools module?
I'm not too worried about this one for the moment, and it could easily be
left out of the PEP itself. Of the sample managers, it seems the most
universally useful, though.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

Walter Dörwald wrote:
I'd like to suggest a small cosmetic change: gencodec.py should output
byte values with two hexdigits instead of four. This makes it easier to
see what is a byte values and what is a codepoint. And it would make
grepping for stuff simpler.

True.

I'll rerun the creation with the above changes sometime this
week.
 
 
 Great, thanks!

Done.

I had to create three custom mapping files for cp1140, koi8-u
and tis-620.

If you want more non-standard charmap codecs converted, please
send me the mapping files in the Unicode standard format for
these files.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Paolino

I'm not sure I understood completely the idea but deriving freeze 
function from hash gives hash a wider importance.
Is __hash__=id inside a class enough to use a set (sets.Set before 2.5) 
derived class instance as a key to a mapping?
Sure I missed the point.


Regards Paolino

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

On 10/24/05, Nick Coghlan [EMAIL PROTECTED] wrote:
 That makes the resolutions for the posted issues:

 1. The slot name __context__ will be used instead of __with__
 2. The builtin name context is currently offlimits due to its ambiguity
 3a. generator-iterators do NOT have a native context
 3b. Use contextmanager as a builtin decorator to get generator-contexts
 4. The __context__ slot will NOT be special cased

+1

 I'll add those into the PEP and reference this thread after Martin is done
 with the SVN migration.

 However, those resolutions bring up the following issues:

5 a. What exception is raised when EXPR does not have a __context__ method?
  b.  What about when the returned object is missing __enter__ or __exit__?
 I suggest raising TypeError in both cases, for symmetry with for loops.
 The slot check is made in C code, so I don't see any difficulty in raising
 TypeError instead of AttributeError if the relevant slots aren't filled.

Why are you so keen on TypeError? I find AttributeError totally
appropriate. I don't see symmetry with for-loops as a valuable
property here. AttributeError and TypeError are often interchangeable
anyway.

6 a. Should a generic closing context manager be provided?

No. Let's provide the minimal mechanisms FIRST.

  b. If yes, should it be a builtin or in a contexttools module?
 I'm not too worried about this one for the moment, and it could easily be
 left out of the PEP itself. Of the sample managers, it seems the most
 universally useful, though.

Let's leave some examples just be examples.

I think I'm leaning towards adding __context__ to locks (all types
defined in tread or threading, including condition variables), files,
and decimal.Context, and leave it at that.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Gary Poster


On Oct 23, 2005, at 6:43 PM, Barry Warsaw wrote:

 I've had this PEP laying around for quite a few months.  It was  
 inspired
 by some code we'd written which wanted to be able to get immutable
 versions of arbitrary objects.  I've finally finished the PEP,  
 uploaded
 a sample patch (albeit a bit incomplete), and I'm posting it here  
 to see
 if there is any interest.

 http://www.python.org/peps/pep-0351.html

I like this.  I'd like it better if it integrated with the adapter  
PEP, so that the freezing mechanism for a given type could be  
pluggable, and could be provided even if the original object did not  
contemplate it.  I don't know where the adapter PEP stands: skimming  
through the (most recent?) thread in January didn't give me a clear  
idea.

As another poster mentioned, in-place freezing is also of interest to  
me (and why I read the PEP Initially), but as also as mentioned  
that's probably unrelated to your PEP.

Gary
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] int(string)

2005-10-24 Thread Alan McIntyre

Fredrik Lundh wrote:

does a plain

a = -100.0

still work on your machine?

D'oh - I seriously broke something, then, because it didn't. 
funny_falcon commented on the patch in SF and suggested a change that
took care of that.  I've uploaded the corrected version of the patch,
which now passes all the tests.

Alan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-24 Thread Raymond Hettinger

[Barry Warsaw]
 I've had this PEP laying around for quite a few months.  It was
inspired
 by some code we'd written which wanted to be able to get immutable
 versions of arbitrary objects.


* FWIW, the _as_immutable() protocol was dropped from sets.py for a
reason.  User reports indicated that it was never helpful in practice.
It added complexity and confusion without producing offsetting benefits.

* AFAICT, there are no use cases for freezing arbitrary objects when the
object types are restricted to just lists and sets but not dicts,
arrays, or other containers.  Even if the range of supported types were
expanded, what applications could use this?  Most apps cannot support
generic substitution of lists and sets -- they have too few methods in
common -- they are almost never interchangeable.

* I'm concerned that generic freezing leads to poor design and
hard-to-find bugs.  One class of bugs results from conflating ordered
and unordered collections as lookup keys.  It is difficult to assess
program correctness when the ordered/unordered distinction has been
abstracted away.  A second class of errors can arise when the original
object mutates and gets out-of-sync with its frozen counterpart.

* For a rare app needing mutable lookup keys, a simple recipe would
suffice:

freeze_pairs = [(list, tuple), (set, frozenset)]

def freeze(obj):
try:
hash(obj)
except TypeError:
for sourcetype, desttype in freeze_pairs:
if isinstance(obj, sourcetype):
return desttype(obj)
raise
else:
return obj

Unlike the PEP, the recipe works with older pythons and is trivially
easy to extend to include other containers.

* The name freeze is problematic because it suggests an in-place
change.  Instead, the proposed mechanism creates a new object.  In
contrast, explicit conversions like tuple(l) or frozenset(s) are obvious
about their running time, space consumed, and new object identity.  

Overall, I'm -1 on the PEP.  Like a bad C macro, the proposed
abstraction hides too much.  We lose critical distinctions of ordered vs
unordered, mutable vs immutable, new objects vs in-place change, etc.
Without compelling use cases, the mechanism smells like a
hyper-generalization.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

 I'm thinking about making all character strings Unicode (possibly with
 different internal representations a la NSString in Apple's Objective
 C) and introduce a separate mutable bytes array data type. But I could
 use some validation or feedback on this idea from actual
 practitioners.

+1 from me, too.

 I'm tempted to say it would be even better if there was a command line 
 option that could be used to force all binary opens to result in bytes, and 
 require all text opens to specify an encoding.

I like this idea, too.  Presumably plain open(FILENAME, MODE) would
then result in a binary open (no encoding specified), which I've
wanted for a long time (and which makes sense).  But it is a change.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

 Python should allow strings to
 contain any Unicode character and should be indexable yielding
 characters rather than half characters. Therefore Python strings
 should appear to be UTF-32.

+1.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

2005-10-24 Thread Phil Thompson

I'm implementing a string-like object in an extension module and trying to 
make it as interoperable with the standard string object as possible. To do 
this I'm implementing the relevant slots and the buffer interface. For most 
things this is fine, but there are a small number of methods in 
stringobject.c that don't use the buffer interface - and I don't understand 
why.

Specifically...

string_contains() doesn't which means that...

MyString(foo) in foobar

...doesn't work.

s.join(sequence) only allows sequence to contain string or unicode objects.

s.strip([chars]) only allows chars to be a string or unicode object. Same for 
lstrip() and rstrip().

s.ljust(width[, fillchar]) only allows fillchar to be a string object (not 
even a unicode object). Same for rjust() and center().

Other methods happily allow types that support the buffer interface as well as 
string and unicode objects.

I'm happy to submit a patch - I just wanted to make sure that this behaviour 
wasn't intentional for some reason.

Thanks,
Phil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

On 10/24/05, Phil Thompson [EMAIL PROTECTED] wrote:
 I'm implementing a string-like object in an extension module and trying to
 make it as interoperable with the standard string object as possible. To do
 this I'm implementing the relevant slots and the buffer interface. For most
 things this is fine, but there are a small number of methods in
 stringobject.c that don't use the buffer interface - and I don't understand
 why.

 Specifically...

 string_contains() doesn't which means that...

 MyString(foo) in foobar

 ...doesn't work.

 s.join(sequence) only allows sequence to contain string or unicode objects.

 s.strip([chars]) only allows chars to be a string or unicode object. Same for
 lstrip() and rstrip().

 s.ljust(width[, fillchar]) only allows fillchar to be a string object (not
 even a unicode object). Same for rjust() and center().

 Other methods happily allow types that support the buffer interface as well as
 string and unicode objects.

 I'm happy to submit a patch - I just wanted to make sure that this behaviour
 wasn't intentional for some reason.

A concern I'd have with fixing this is that Unicode objects also
support the buffer API. In any situation where either str or unicode
is accepted I'd be reluctant to guess whether a buffer object was
meant to be str-like or Unicode-like. I think this covers all the
cases you mention here.

We need to support this better in Python 3000; but I'm not sure you
can do much better in Python 2.x; subclassing from str is unlikely to
work for you because then too many places are going to assume the
internal representation is also the same as for str.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Inconsistent Use of Buffer Interface instringobject.c

2005-10-24 Thread Fredrik Lundh

Guido van Rossum wrote:

 A concern I'd have with fixing this is that Unicode objects also
 support the buffer API. In any situation where either str or unicode
 is accepted I'd be reluctant to guess whether a buffer object was
 meant to be str-like or Unicode-like. I think this covers all the
 cases you mention here.

iirc, SRE solves that by comparing the length of the sequence with the
number of bytes in the buffer.  if length == bytes, it's an 8-bit string; if
length*sizeof(Py_Unicode) == bytes, it's a Unicode string.

/F



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

Guido van Rossum wrote:
 On 10/24/05, Phil Thompson [EMAIL PROTECTED] wrote:
 
I'm implementing a string-like object in an extension module and trying to
make it as interoperable with the standard string object as possible. To do
this I'm implementing the relevant slots and the buffer interface. For most
things this is fine, but there are a small number of methods in
stringobject.c that don't use the buffer interface - and I don't understand
why.

Specifically...

string_contains() doesn't which means that...

MyString(foo) in foobar

...doesn't work.

s.join(sequence) only allows sequence to contain string or unicode objects.

s.strip([chars]) only allows chars to be a string or unicode object. Same for
lstrip() and rstrip().

s.ljust(width[, fillchar]) only allows fillchar to be a string object (not
even a unicode object). Same for rjust() and center().

Other methods happily allow types that support the buffer interface as well as
string and unicode objects.

I'm happy to submit a patch - I just wanted to make sure that this behaviour
wasn't intentional for some reason.
 
 
 A concern I'd have with fixing this is that Unicode objects also
 support the buffer API. In any situation where either str or unicode
 is accepted I'd be reluctant to guess whether a buffer object was
 meant to be str-like or Unicode-like. I think this covers all the
 cases you mention here.

This situation is a little better than that: the buffer
interface has a slot called getcharbuffer which is what
the string methods use in case they find that a string
argument is not of type str or unicode.

A few don't, but I guess we could fix this.

str.split(), .[lr]strip() all support the getcharbuffer
interface. str.join() currently doesn't. The Unicode object also
leaves out a few cases, among those the ones you mentioned.
If it's better for inter-op, I guess we should make an effort
and let all of them support the getcharbuffer interface.

 We need to support this better in Python 3000; but I'm not sure you
 can do much better in Python 2.x; subclassing from str is unlikely to
 work for you because then too many places are going to assume the
 internal representation is also the same as for str.

As first step, I'd suggest to implement the gatcharbuffer
slot. That will already go a long way.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

On 10/24/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 Guido van Rossum wrote:
  A concern I'd have with fixing this is that Unicode objects also
  support the buffer API. In any situation where either str or unicode
  is accepted I'd be reluctant to guess whether a buffer object was
  meant to be str-like or Unicode-like. I think this covers all the
  cases you mention here.

 This situation is a little better than that: the buffer
 interface has a slot called getcharbuffer which is what
 the string methods use in case they find that a string
 argument is not of type str or unicode.

I stand corrected!

 As first step, I'd suggest to implement the gatcharbuffer
 slot. That will already go a long way.

Phil, if anything still doesn't work after doing what Marc-Andre says,
those would be good candidates for fixes!

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

2005-10-24 Thread Phil Thompson

On Monday 24 October 2005 7:39 pm, Guido van Rossum wrote:
 On 10/24/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
  Guido van Rossum wrote:
   A concern I'd have with fixing this is that Unicode objects also
   support the buffer API. In any situation where either str or unicode
   is accepted I'd be reluctant to guess whether a buffer object was
   meant to be str-like or Unicode-like. I think this covers all the
   cases you mention here.
 
  This situation is a little better than that: the buffer
  interface has a slot called getcharbuffer which is what
  the string methods use in case they find that a string
  argument is not of type str or unicode.

 I stand corrected!

  As first step, I'd suggest to implement the gatcharbuffer
  slot. That will already go a long way.

 Phil, if anything still doesn't work after doing what Marc-Andre says,
 those would be good candidates for fixes!

I have implemented getcharbuffer - I was highlighting those methods where the 
getcharbuffer implementation was ignored.

I'll put a patch together.

Phil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

Neil Hodgson wrote:
For Windows, the code will get a little uglier, needing to perform
 an allocation/encoding and deallocation more often then at present but
 I don't think there will be a speed degradation as Windows is
 currently performing a conversion from 8 bit to UTF-16 inside many
 system calls.
[...]
 
For indexing UTF-16, a flag could be set to show if the string is
 all in the base plane and if not, an index could be constructed when
 and if needed.

There are many design alternatives: one option would be to support
*three* internal representations in a single type, generating the
others from the one operation existing as needed. The default, initial
representation might be UTF-8, with UCS-4 only being generated when
indexing occurs, and UCS-2 only being generated when the API requires
it. On concatenation, always concatenate just one represenation: either
one that is already present in both operands, else UTF-8.

  It'd be good to get some feel for what proportion of
 string operations performed require indexing. Many, such as
 startswith, split, and concatenation don't require indexing. The
 proportion of operations that use indexing to scan strings would also
 be interesting as adding a (currentIndex, currentOffset) cursor to
 string objects would be another approach.

Indeed. My guess is that indexing is more common than you think,
especially when iterating over the string. Of course, iteration
could also operate on UTF-8, if you introduced string iterator
objects.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

Walter Dörwald wrote:
 Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put 
 a complete decoding_table into koi8_u.py?

Not sure. Unfortunately, the tables being used as source are not part of
the Python source, so nobody except MAL can faithfully regenerate them.
If they were part of the Python source, explicitly adding one for
KOI8-U would certainly be feasible.

 I.e. change:
 
 decoding_map.update({
 0x0080: 0x0402, #  CYRILLIC CAPITAL LETTER DJE

Hmm. I was suggesting to remove decoding_map completely, in which
case neither the current form nor your suggested cosmetic change
would survive.

 to
 
 decoding_table = (
 u'\x00' # 0x00 - U+ NULL

Using U+ in comments to denote the codepoints is a good idea,
anyway.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

M.-A. Lemburg wrote:
 I just left them in because I thought they wouldn't do any harm
 and might be useful in some applications.
 
 Removing them where not directly needed by the codec would not
 be a problem.

I think memory usage caused is measurable (I estimated 4KiB per
dictionary). More importantly, people apparently currently change
the dictionaries we provide and expect the codecs to automatically
pick up the modified mappings. It would be better if the breakage
is explicit (i.e. they get an AttributeError on the variable) instead
of implicit (their changes to the mapping simply have no effect
anymore).

 KOI8-U is not available as mapping on ftp.unicode.org and
 I only recreated codecs from the mapping files available
 there.

I think we should come up with mapping tables for the additional
codecs as well, and maintain them in the CVS. This also applies
to things like rot13.

 I'll rerun the creation with the above changes sometime this
 week.

I hope I can finish my encoding routine shortly, which again
results in changes to the codecs (replacing the encoding dictionaries
with other lookup tables).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New codecs checked in

M.-A. Lemburg wrote:

 I had to create three custom mapping files for cp1140, koi8-u
 and tis-620.

Can you please publish the files you have used somewhere? They
best go into the Python CVS.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Antoine Pitrou


 There are many design alternatives: one option would be to support
 *three* internal representations in a single type, generating the
 others from the one operation existing as needed. The default, initial
 representation might be UTF-8, with UCS-4 only being generated when
 indexing occurs, and UCS-2 only being generated when the API requires
 it. On concatenation, always concatenate just one represenation: either
 one that is already present in both operands, else UTF-8.

Wouldn't it be simpler to use:
- one-byte representation if every character = 0xFF
- two-byte representation if every character = 0x
- four-byte representation otherwise

Then combining several strings means using the larger representation as
a result (*). In practice, most use cases will not involve the four-byte
representation.

(*) a heuristic can be invented so that, when producing a smaller string
(by stripping/slicing/etc.), it will sometimes check whether a
narrower representation is possible.
For example : store the length of the string when the last check
occurred, and do a new check when the length falls below the half that
value.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

On 10/24/05, Martin v. Löwis [EMAIL PROTECTED] wrote:
 Indeed. My guess is that indexing is more common than you think,
 especially when iterating over the string. Of course, iteration
 could also operate on UTF-8, if you introduced string iterator
 objects.

Python's slice-and-dice model pretty much ensures that indexing is
common. Almost everything is ultimately represented as indices: regex
search results have the index in the API, find()/index() return
indices, many operations take a start and/or end index. As long as
that's the case, indexing better be fast.

Changing the APIs would be much work, although perhaps not impossible
of Python 3000. For example, Raymond Hettinger's partition() API
doesn't refer to indices at all, and can replace many uses of find()
or index().

Still, the mere existence of __getitem__ and __getslice__ on strings
makes it necessary to implement them efficiently. How realistic would
it be to drop them? What should replace them? Some kind of abstract
pointers-into-strings perhaps, but that seems much more complex.

The trick seems to be to support both simple programs manipulating
short strings (where indexing is probably the easiest API to
understand, and the additional copying is unlikely to cause
performance problems) , as well as  programs manipulating very large
buffers containing text and doing sophisticated string processing on
them. Perhaps we could provide a different kind of API to support the
latter, perhaps based on a mutable character buffer data type without
direct indexing?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

On 10/24/05, Martin v. Löwis [EMAIL PROTECTED] wrote:
 Guido van Rossum wrote:
  Changing the APIs would be much work, although perhaps not impossible
  of Python 3000. For example, Raymond Hettinger's partition() API
  doesn't refer to indices at all, and can replace many uses of find()
  or index().

 I think Neil's proposal is not to make them go away, but to implement
 them less efficiently. For example, if the internal representation
 is UTF-8, indexing requires linear time, as opposed to constant time.
 If the internal representation is UTF-16, and you have a flag to
 indicate whether there are any surrogates on the string, indexing
 is constant if the flag is false, else linear.

I understand all that. My point is that it's a bad idea to offer an
indexing operation that isn't O(1).

  Perhaps we could provide a different kind of API to support the
  latter, perhaps based on a mutable character buffer data type without
  direct indexing?

 There are different design goals conflicting here:
 - some think: all my data is ASCII, so I want to only use one
byte per character.
 - others think: all my data goes to the Windows API, so I want
to use 2 byte per character.
 - yet others think: I want all of Unicode, with proper, efficient
indexing, so I want four bytes per char.

I doubt the last one though. Probably they really don't want efficient
indexing, they want to perform higher-level operations that currently
are only possible using efficient indexing or slicing. With the right
API. perhaps they could work just as efficiently with an internal
representation of UTF-8.

 It's not so much a matter of API as a matter of internal
 representation. The API doesn't have to change (except for the
 very low-level C API that directly exposes Py_UNICODE*, perhaps).

I think the API should reflect the representation *to some extend*,
namely it shouldn't claim to have operations that are typically
thought of as O(1) that can only be implemented as O(n). An internal
representation of UTF-8 might make everyone happy except heavy Windows
users; but it requires changes to the API so people won't be writing
Python 2.x-style string slinging code.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Neil Hodgson

M.-A. Lemburg:

 Unicode has the concept of combining code points, e.g. you can
 store an é (e with a accent) as e + '. Now if you slice
 off the accent, you'll break the character that you encoded
 using combining code points.
 ...
 next_indextype(u, index) - integer

 Returns the Unicode object index for the start of the next
 indextype found after u[index] or -1 in case no next element
 of this type exists.

   Should entity breakage be further discouraged by returning a slice
here rather than an object index?

   Something like:

i = first_grapheme(u)
x = 0
while x  width and u[i] != \n:
   x, _ = draw(u[i], (x, y))
   i = next_grapheme(u, i)

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

  - yet others think: I want all of Unicode, with proper, efficient
 indexing, so I want four bytes per char.
 
 I doubt the last one though. Probably they really don't want efficient
 indexing, they want to perform higher-level operations that currently
 are only possible using efficient indexing or slicing. With the right
 API. perhaps they could work just as efficiently with an internal
 representation of UTF-8.

I just got mail this morning from a researcher who wants exactly what
Martin described, and wondered why the default MacPython 2.4.2 didn't
provide it by default. :-)

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

On 10/24/05, Bill Janssen [EMAIL PROTECTED] wrote:
   - yet others think: I want all of Unicode, with proper, efficient
  indexing, so I want four bytes per char.
 
  I doubt the last one though. Probably they really don't want efficient
  indexing, they want to perform higher-level operations that currently
  are only possible using efficient indexing or slicing. With the right
  API. perhaps they could work just as efficiently with an internal
  representation of UTF-8.

 I just got mail this morning from a researcher who wants exactly what
 Martin described, and wondered why the default MacPython 2.4.2 didn't
 provide it by default. :-)

Oh, I don't doubt that they want it. But often they don't *need* it,
and the higher-level goal they are trying to accomplish can be dealt
with better in a different way. (Sort of my response to people asking
for static typing in Python as well. :-)

Did they tell you what they were trying to do that MacPython 2.4.2
wouldn't let them, beyond represent a large Unicode string as an
array of 4-byte integers?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Greg Ewing

Guido van Rossum wrote:

 I think the API should reflect the representation *to some extend*,
 namely it shouldn't claim to have operations that are typically
 thought of as O(1) that can only be implemented as O(n).

Maybe a compromise could be reached by using a
btree of chunks or something, so indexing is
O(log n). Not as good as O(1) but a lot better
than O(n).

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | A citizen of NewZealandCorp, a   |
Christchurch, New Zealand  | wholly-owned subsidiary of USA Inc.  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Greg Ewing

Guido van Rossum wrote:

 Python's slice-and-dice model pretty much ensures that indexing is
 common. Almost everything is ultimately represented as indices: regex
 search results have the index in the API, find()/index() return
 indices, many operations take a start and/or end index.

Maybe the idea of string views should be reconsidered in
light of this. It's been criticised on the grounds that
its use could keep large strings alive longer than needed,
but if operations that currently return indices instead
returned string views, this wouldn't be any more of a
concern than it is now, especially if there is an easy
way to explicitly materialise the view as an independent
string when wanted.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | A citizen of NewZealandCorp, a   |
Christchurch, New Zealand  | wholly-owned subsidiary of USA Inc.  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).