[Python-Dev] Re: Re: Re: marshal / unmarshal

2005-04-11 Thread Fredrik Lundh
Tim Peters wrote:
> [Fredrik Lundh]
>> is changing the marshal format really the right thing to do at this
>> point?
>
> I don't see anything special about "this point" -- it's just sometime
> between 2.4.1 and 2.5a0.  What do you have in mind?

I was under the impression that the marshal format has been stable for
quite a long time (people are using it for various RPC protocols, among
other things).  I might be wrong.

 



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: Re: Re: marshal / unmarshal

2005-04-11 Thread Bob Ippolito
On Apr 11, 2005, at 12:33 AM, Fredrik Lundh wrote:
Tim Peters wrote:
[Fredrik Lundh]
is changing the marshal format really the right thing to do at this
point?
I don't see anything special about "this point" -- it's just sometime
between 2.4.1 and 2.5a0.  What do you have in mind?
I was under the impression that the marshal format has been stable for
quite a long time (people are using it for various RPC protocols, among
other things).  I might be wrong.
The documentation for marshal explicitly states that you should not use 
it for such purposes.

There's also a version argument to dumps and dump (though the argument 
list in the dump documentation doesn't say so), where version 0 is 
pre-2.4, and version 1 is 2.4+.  I don't think it's out of the question 
to add a version 2 for 2.5+ that uses a better serialization for floats 
(and it should probably add set/frozenset too since those are builtins 
now).

-bob
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Michael Hudson
Tim Peters <[EMAIL PROTECTED]> writes:

> The 754 standard doesn't say anything about how the difference between
> signaling and quiet NaNs is represented.  So it's possible that a qNaN
> on one box would "look like" an sNaN on a different box, and vice
> versa.  But since most people run with all FPU traps disabled, and
> Python doesn't expose a way to read the FPU status flags, they
> couldn't tell the difference.

OK.  Do you have any intuition as to whether 754 implementations
actually *do* differ on this point?

> Copying bytes works perfectly for all other cases (signed zeroes,
> non-zero finites, infinities), because their representations are
> wholly defined, although it's possible that a subnormal on one box
> will be treated like a zero (with the same sign) on a
> partially-conforming box.

I'd find struggling to care about that pretty hard.

>> [1] I'm slighyly worried about oddball systems that do insane things
>>with the FPU by default -- but don't think the mooted change would
>>make things any worse.
>
> Sorry, don't know what that means.

Neither do I, now.  Oh well .

>> The question, of course, is how to tell.
>
> Store a few small doubles at module initialization time and stare at

./configure time, surely?

> their bits.  That's enough to settle whether a 754 format is in use,
> and, if it is, whether it's big-endian or little-endian.

Do you have a pointer to code that does this?  Googling around the
subject appears to turn up lots of Python stuff...

>> [2] Exaggeration, I realize -- but how many non 754 systems are out
>>there?  How many will see Python 2.5?
>
> No idea here.  The existing pack routines strive to do a good job of
> _creating_ an IEEE-754-format representation regardless of platform
> representation.  I assume that code would still be present, so
> "oddball" platforms would be left no worse off than they are now.

Well, yes, given the above.  The text this footnote was attached to
was asking if just assuming 754 float formats would inconvenience
anyone.

Cheers,
mwh

-- 
  I don't have any special knowledge of all this. In fact, I made all
  the above up, in the hope that it corresponds to reality.
-- Mark Carroll, ucam.chat
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Tim Peters
[Tim]
>> The 754 standard doesn't say anything about how the difference between
>> signaling and quiet NaNs is represented.  So it's possible that a qNaN
>> on one box would "look like" an sNaN on a different box, and vice
>> versa.  But since most people run with all FPU traps disabled, and
>> Python doesn't expose a way to read the FPU status flags, they
>> couldn't tell the difference.

[mwh]
> OK.  Do you have any intuition as to whether 754 implementations
> actually *do* differ on this point?

Not anymore -- hasn't been part of my job, or a hobby, for over a
decade.  There were differences a decade+ ago.  All NaNs have all
exponent bits set, and at least one mantissa bit set, and every bit
pattern of that form represents a NaN.  That's all the standard says. 
The most popular way to distinguish quiet from signaling NaNs keyed
off the most-significant mantissa bit:  set for a qNaN, clear for an
sNaN.  It's possible that all 754 HW does that now.

There's at least still that Pentium hardware adds a third not-a-number
possibility: in addition to 754's quiet and signaling NaNs, it also
has "indeterminate" values.  Here w/ native Windows Python 2.4 on a
Pentium:

>>> inf = 1e300 * 1e300
>>> inf - inf   # indeterminate
-1.#IND
>>> - _  # but the negation of IND is a quiet NaN
1.#QNAN
>>>

Do the same thing under Cygwin Python on the same box and it prints "NaN" twice.

Do people care about this?  I don't know.  It seems unlikely -- in
effect, IND just gives a special string name to a single one of the
many bit patterns that represent a quiet NaN.  OTOH, Pentium hardware
still preserves this distinction, and MS library docs do too.  IND
isn't part of the 754 standard (although, IIRC, it was part of a
pre-standard draft, which Intel implemented and is now stuck with).

>> Copying bytes works perfectly for all other cases (signed zeroes,
>> non-zero finites, infinities), because their representations are
>> wholly defined, although it's possible that a subnormal on one box
>> will be treated like a zero (with the same sign) on a
>> partially-conforming box.

> I'd find struggling to care about that pretty hard.

Me too.

>>> The question, of course, is how to tell.

>> Store a few small doubles at module initialization time and stare at

> ./configure time, surely?

Unsure.  Not all Python platforms _have_ "./configure time".  Module
initialization code is harder to screw up for that reason (the code is
in an obvious place then, self-contained, and doesn't require any
relevant knowledge of any platform porter unless/until it breaks).

>> their bits.  That's enough to settle whether a 754 format is in use,
>> and, if it is, whether it's big-endian or little-endian.

> Do you have a pointer to code that does this?

No.  Pemberton's enquire.c contains enough code to do it.  Given how
few distinct architectures still exist, it's probably enough to store
just double x = 1.5 and stare at it.

>>> [2] Exaggeration, I realize -- but how many non 754 systems are out
>>>there?  How many will see Python 2.5?

>> No idea here.  The existing pack routines strive to do a good job of
>> _creating_ an IEEE-754-format representation regardless of platform
>> representation.  I assume that code would still be present, so
>> "oddball" platforms would be left no worse off than they are now.
 
> Well, yes, given the above.  The text this footnote was attached to
> was asking if just assuming 754 float formats would inconvenience
> anyone.

I think I'm still missing your intent here.  If you're asking whether
Python can blindly assume that 745 is in use, I'd say that's
undesirable but defensible if necessary.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Michael Hudson
Tim Peters <[EMAIL PROTECTED]> writes:

> [Tim]
>>> The 754 standard doesn't say anything about how the difference between
>>> signaling and quiet NaNs is represented.  So it's possible that a qNaN
>>> on one box would "look like" an sNaN on a different box, and vice
>>> versa.  But since most people run with all FPU traps disabled, and
>>> Python doesn't expose a way to read the FPU status flags, they
>>> couldn't tell the difference.
>
> [mwh]
>> OK.  Do you have any intuition as to whether 754 implementations
>> actually *do* differ on this point?
>
> Not anymore -- hasn't been part of my job, or a hobby, for over a
> decade.  There were differences a decade+ ago.  All NaNs have all
> exponent bits set, and at least one mantissa bit set, and every bit
> pattern of that form represents a NaN.  That's all the standard says. 
> The most popular way to distinguish quiet from signaling NaNs keyed
> off the most-significant mantissa bit:  set for a qNaN, clear for an
> sNaN.  It's possible that all 754 HW does that now.

[snip details]

OK, so the worst that could happen here is that moving marshal data
from one box to another could turn one sort of NaN into another?  This
doesn't seem very bad.

[denorms]

>> I'd find struggling to care about that pretty hard.
>
> Me too.

Good.

 The question, of course, is how to tell.
>
>>> Store a few small doubles at module initialization time and stare at
>
>> ./configure time, surely?
>
> Unsure.  Not all Python platforms _have_ "./configure time".  

But they all have pyconfig.h.

> Module initialization code is harder to screw up for that reason
> (the code is in an obvious place then, self-contained, and doesn't
> require any relevant knowledge of any platform porter unless/until
> it breaks).

Well, sure, but false negatives here are not a big deal here.

>>> their bits.  That's enough to settle whether a 754 format is in use,
>>> and, if it is, whether it's big-endian or little-endian.
>
>> Do you have a pointer to code that does this?
>
> No.  Pemberton's enquire.c contains enough code to do it.  

Yikes!  And much else besides.

> Given how few distinct architectures still exist, it's probably
> enough to store just double x = 1.5 and stare at it.

Something along these lines:

double x = 1.5;
is_big_endian_ieee_double = sizeof(double) == 8 && \
   memcmp((char*)&x, "\077\370\000\000\000\000\000\000", 8);

?

[me being obscure]
> I think I'm still missing your intent here.  If you're asking whether
> Python can blindly assume that 745 is in use, I'd say that's
> undesirable but defensible if necessary.

Yes, that's what I was asking, in a rather obscure way.

Cheers,
mwh

-- 
  Strangely enough  I saw just such a beast at  the grocery store
  last night. Starbucks sells Javachip. (It's ice cream, but that
  shouldn't be an obstacle for the Java marketing people.)
 -- Jeremy Hylton, 29 Apr 1997
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] args attribute of Exception objects

2005-04-11 Thread Sébastien de Menten
Hi,
When I need to make sense of a python exception, I often need to parse the 
string exception in order to retrieve the data.
Example:

try:
   print foo
except NameError, e:
   print e.args
   symbol = e.args[0][17:-16]
==> ("NameError: name 'foo' is not defined", )
or
try:
   (4).foo
except NameError, e:
   print e.args
==> ("'int' object has no attribute 'foo'",)
Moreover, in the documentation about Exception, I read
"""Warning: Messages to exceptions are not part of the Python API. Their 
contents may change from one version of Python to the next without warning 
and should not be relied on by code which will run under multiple versions 
of the interpreter. """

So even args could not be relied upon !
Two questions:
 1) did I miss something in dealing with exceptions ?
 2) Could this be changed to .args more in line with:
   a) first example: e.args = ('foo', "NameError: name 'foo' is not 
defined")
   b) second example: e.args = (4, 'foo', "'int' object has no attribute 
'foo'",)
 the message of the string can even be retrieved with str(e) so it is also 
redundant.
 BTW, the Warning in the doc enables to change this :-) To be backward 
compatible, the error message could also be the first element of the tuple.

Seb
ps: There may be problems (that I am not aware) with an exception keeping 
references to other objects

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Tim Peters
...

[mwh]
> OK, so the worst that could happen here is that moving marshal data
> from one box to another could turn one sort of NaN into another?

Right.  Assuming source and destination boxes both use 754 format, and
the implementation adjusts endianess if necessary.

Heh.  I have a vague half-memory of _some_ box that stored the two
4-byte "words" in an IEEE double in one order, but the bytes within
each word in the opposite order.  It's always something ...

> This doesn't seem very bad.

Not bad at all:

But since most people run with all FPU traps disabled, and
Python doesn't expose a way to read the FPU status flags, they
couldn't tell the difference.

 Store a few small doubles at module initialization time and stare at

>>> ./configure time, surely?

>> Unsure.  Not all Python platforms _have_ "./configure time".
 
> But they all have pyconfig.h.

Yes, and then a platform porter has to understand what to
#define/#undefine, and why.  People doing cross-compilation may have
an especially confusing time of it.  Module initialization code "just
works", so I certainly understand why it doesn't appeal to the Unix
frame of mind .

>> Module initialization code is harder to screw up for that reason
>> (the code is in an obvious place then, self-contained, and doesn't
>> require any relevant knowledge of any platform porter unless/until
>> it breaks).

> Well, sure, but false negatives here are not a big deal here.

Sorry, unsure that "false negative" means here.

...

> Something along these lines:
>
> double x = 1.5;
> is_big_endian_ieee_double = sizeof(double) == 8 && \
>   memcmp((char*)&x, "\077\370\000\000\000\000\000\000", 8);

Right, it's that easy -- at least under MSVC and gcc.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New style classes and operator methods

2005-04-11 Thread Armin Rigo
Hi Greg,

On Fri, Apr 08, 2005 at 05:03:42PM +1200, Greg Ewing wrote:
> If the left and right operands are of the same class,
> and the class implements a right operand method but
> not a left operand method, the right operand method
> is not called. Instead, two attempts are made to call
> the left operand method.

This is not a general rule.  The rule is that if both elements are of the same
class, only the non-reversed method is ever called.  The confusing bit is
about having it called twice.  Funnily enough, this only occurs for some
operators (I think only add and mul).  The reason is that internally, the C
core distinguishes about number adding vs sequence concatenation, and number
multiplying vs sequence repetition.  So __add__() and __mul__() are called
twice: once as a numeric computation and as a sequence operation...

Could be fixed with more strange special cases in abstract.c, but I'm not sure
it's worth it.


Armin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: Re: Re: marshal / unmarshal

2005-04-11 Thread Martin v. Löwis
Fredrik Lundh wrote:
> I was under the impression that the marshal format has been stable for
> quite a long time (people are using it for various RPC protocols, among
> other things).  I might be wrong.

Python 2.4 introduced support for string sharing in marshal files, with
an option to suppress sharing if an application needs to suppress it for
backwards compatibility.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Michael Hudson
I've just submitted http://python.org/sf/1180995 which adds format
codes for binary marshalling of floats if version > 1, but it doesn't
quite have the effect I expected (see below):

>>> inf = 1e308*1e308
>>> nan = inf/inf
>>> marshal.dumps(nan, 2)
Traceback (most recent call last):
  File "", line 1, in ?
ValueError: unmarshallable object

frexp(nan, &e), it turns out, returns nan, which results in this (to
be expected if you read _PyFloat_Pack8 and know that I'm using a
new-ish GCC -- it might be different for MSVC 6).

Also (this is the same thing, really):

>>> struct.pack('>d', inf)
Traceback (most recent call last):
  File "", line 1, in ?
SystemError: frexp() result out of range

Although I was a little surprised by this:

>>> struct.pack('d', inf)
'\x7f\xf0\x00\x00\x00\x00\x00\x00'

(this is a big-endian system).  Again, reading the source explains the
behaviour.

Tim Peters <[EMAIL PROTECTED]> writes:

> ...
>
> [mwh]
>> OK, so the worst that could happen here is that moving marshal data
>> from one box to another could turn one sort of NaN into another?
>
> Right.  Assuming source and destination boxes both use 754 format, and
> the implementation adjusts endianess if necessary.

Well, I was assuming marshal would do floats little-endian-wise, as it
does for integers.

> Heh.  I have a vague half-memory of _some_ box that stored the two
> 4-byte "words" in an IEEE double in one order, but the bytes within
> each word in the opposite order.  It's always something ...

I recall stories of machines that stored the bytes of long in some
crazy order like that.  I think Python would already be broken on such
a system, but, also, don't care.

> Store a few small doubles at module initialization time and stare at
>
 ./configure time, surely?
>
>>> Unsure.  Not all Python platforms _have_ "./configure time".
>  
>> But they all have pyconfig.h.
>
> Yes, and then a platform porter has to understand what to
> #define/#undefine, and why.  People doing cross-compilation may have
> an especially confusing time of it.

Well, they can always not #define HAVE_IEEE_DOUBLES and not suffer all
that much (this is what I meant by false negatives below).

> Module initialization code "just works", so I certainly understand
> why it doesn't appeal to the Unix frame of mind .

It just strikes as silly to test at runtime sometime that is so
obviously not going to change between invocations.  But it's not a big
deal either way.

> ...
>
>> Something along these lines:
>>
>> double x = 1.5;
>> is_big_endian_ieee_double = sizeof(double) == 8 && \
>>   memcmp((char*)&x, "\077\370\000\000\000\000\000\000", 8);
>
> Right, it's that easy

Cool.

> -- at least under MSVC and gcc.

Huh?  Now it's my turn to be confused (for starters, under MSVC ieee
doubles really can be assumed...).

Cheers,
mwh 

-- 
  You sound surprised.  We're talking about a government department
  here - they have procedures, not intelligence.
-- Ben Hutchings, cam.misc
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Tim Peters
[Michael Hudson]
> I've just submitted http://python.org/sf/1180995 which adds format
> codes for binary marshalling of floats if version > 1, but it doesn't
> quite have the effect I expected (see below):

> >>> inf = 1e308*1e308
> >>> nan = inf/inf
> >>> marshal.dumps(nan, 2)
> Traceback (most recent call last):
>  File "", line 1, in ?
> ValueError: unmarshallable object

I don't understand.  Does "binary marshalling" _not_ mean just copying
the bytes on a 754 platform?  If so, that won't work.  I pointed out
the relevant comments before:

/* The pack routines write 4 or 8 bytes, starting at p.
...
 * Bug:  What this does is undefined if x is a NaN or infinity.
 * Bug:  -0.0 and +0.0 produce the same string.
 */
PyAPI_FUNC(int) _PyFloat_Pack4(double x, unsigned char *p, int le);
PyAPI_FUNC(int) _PyFloat_Pack8(double x, unsigned char *p, int le);

> frexp(nan, &e), it turns out, returns nan,

This is an undefined case in C89 (all 754 special values are).

> which results in this (to be expected if you read _PyFloat_Pack8 and
> know that I'm using a new-ish GCC -- it might be different for MSVC 6).
>
> Also (this is the same thing, really):

Right.  So is pickling with proto >= 1.  Changing the pack/unpack
routines to copy bytes instead (when possible) "fixes" all of these
things at one stroke, on boxes where it applies.
 
> >>> struct.pack('>d', inf)
> Traceback (most recent call last):
>  File "", line 1, in ?
> SystemError: frexp() result out of range
>
> Although I was a little surprised by this:
>
> >>> struct.pack('d', inf)
> '\x7f\xf0\x00\x00\x00\x00\x00\x00'
>
> (this is a big-endian system).  Again, reading the source explains the
> behaviour.

>>> OK, so the worst that could happen here is that moving marshal data
>>> from one box to another could turn one sort of NaN into another?

>> Right.  Assuming source and destination boxes both use 754 format, and
>> the implementation adjusts endianess if necessary.

> Well, I was assuming marshal would do floats little-endian-wise, as it
> does for integers.

Then on a big-endian 754 system, loads() will have to reverse the
bytes in the little-endian marshal bytestring, and dumps() likewise. 
That's all "if necessary" meant -- sometimes cast + memcpy isn't
enough, and regardless of which direction marshal decides to use.

>> Heh.  I have a vague half-memory of _some_ box that stored the two
>> 4-byte "words" in an IEEE double in one order, but the bytes within
>> each word in the opposite order.  It's always something ...

> I recall stories of machines that stored the bytes of long in some
> crazy order like that.  I think Python would already be broken on such
> a system, but, also, don't care.

Python does very little that depends on internal native byte order,
and C hides it in the absence of casting abuse.  Copying internal
native bytes across boxes is plain ugly -- can't get more brittle than
that.  In this case it looks like a good tradeoff, though.

> ...
> Well, they can always not #define HAVE_IEEE_DOUBLES and not suffer all
> that much (this is what I meant by false negatives below).
> ...
> It just strikes as silly to test at runtime sometime that is so
> obviously not going to change between invocations.  But it's not a big
> deal either way.

It isn't to me either.  It just strikes me as silly to give porters
another thing to wonder about and screw up when it's possible to solve
it completely with a few measly runtime cycles .

>>> Something along these lines:
>>>
>>> double x = 1.5;
>>> is_big_endian_ieee_double = sizeof(double) == 8 && \
>>>   memcmp((char*)&x, "\077\370\000\000\000\000\000\000", 8);

>> Right, it's that easy

> Cool.

>> -- at least under MSVC and gcc.
 
> Huh?  Now it's my turn to be confused (for starters, under MSVC ieee
> doubles really can be assumed...).

So you have no argument with the "at least under MSVC" part . 
There's nothing to worry about here -- I was just tweaking.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com