Re: [Python-Dev] Re: marshal / unmarshal

2005-04-15 Thread Jack Jansen
On 14-apr-05, at 15:08, David Robinow wrote:
On 4/11/05, Tim Peters [EMAIL PROTECTED] wrote:
Heh.  I have a vague half-memory of _some_ box that stored the two
4-byte words in an IEEE double in one order, but the bytes within
each word in the opposite order.  It's always something ...
 I believe this was the Floating Instruction Set on the PDP 11/35.
The fact that it's still remembered 30 years later shows how unusual 
it was.
I think it was actually logical, because all PDP-11s (there were 2 or 
3 FPU instructionsets/architecture in the family IIRC) stored 32 bit 
integers in middle-endian (high-order word first, but low-order byte 
first).

But note that neither of the PDP-11 FPUs were IEEE, that was a much 
later invention. At least, I didn't come across it until much later:-)
--
Jack Jansen, [EMAIL PROTECTED], http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma 
Goldman

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-12 Thread Michael Hudson
My mail is experincing random delays of up to a few hours at the
moment.  I wrote this before I saw your comments on my patch.

Tim Peters [EMAIL PROTECTED] writes:

 [Michael Hudson]
 I've just submitted http://python.org/sf/1180995 which adds format
 codes for binary marshalling of floats if version  1, but it doesn't
 quite have the effect I expected (see below):

  inf = 1e308*1e308
  nan = inf/inf
  marshal.dumps(nan, 2)
 Traceback (most recent call last):
  File stdin, line 1, in ?
 ValueError: unmarshallable object

 I don't understand.  Does binary marshalling _not_ mean just copying
 the bytes on a 754 platform?

No, it means using _PyFloat_Pack8/Unpack8, like the patch description
says.  Making those functions just fiddle bytes when they can I regard
as a separate project (watch a patch manager near you, though).

 If so, that won't work.

I can tell! wink

 Right.  Assuming source and destination boxes both use 754 format, and
 the implementation adjusts endianess if necessary.

 Well, I was assuming marshal would do floats little-endian-wise, as it
 does for integers.

 Then on a big-endian 754 system, loads() will have to reverse the
 bytes in the little-endian marshal bytestring, and dumps() likewise. 

Really?  Even I had worked this out...

 Heh.  I have a vague half-memory of _some_ box that stored the two
 4-byte words in an IEEE double in one order, but the bytes within
 each word in the opposite order.  It's always something ...

 I recall stories of machines that stored the bytes of long in some
 crazy order like that.  I think Python would already be broken on such
 a system, but, also, don't care.

 Python does very little that depends on internal native byte order,
 and C hides it in the absence of casting abuse.  

This surely does:

PyObject *
PyLong_FromLongLong(PY_LONG_LONG ival)
{
PY_LONG_LONG bytes = ival;
int one = 1;
return _PyLong_FromByteArray(
(unsigned char *)bytes,
   SIZEOF_LONG_LONG, IS_LITTLE_ENDIAN, 1);
}

It occurs that in the IEEE case, special values can be detected with
reliablity -- by picking the exponent field out by force -- and a
warning emitted or exception raised.  Good idea?  Hard to say, to me.

Cheers,
mwh

Oh, by the way: http://python.org/sf/1181301

-- 
  It is time-consuming to produce high-quality software. However,
  that should not alone be a reason to give up the high standards
  of Python development.  -- Martin von Loewis, python-dev
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-12 Thread Tim Peters
...

[mwh]
 I recall stories of machines that stored the bytes of long in some
 crazy order like that.  I think Python would already be broken on such
 a system, but, also, don't care.

[Tim]
 Python does very little that depends on internal native byte order,
 and C hides it in the absence of casting abuse.

[mwh]
 This surely does:

 PyObject *
 PyLong_FromLongLong(PY_LONG_LONG ival)
 {
PY_LONG_LONG bytes = ival;
int one = 1;
return _PyLong_FromByteArray(
(unsigned char *)bytes,
   SIZEOF_LONG_LONG, IS_LITTLE_ENDIAN, 1);
 }

Yes, that's casting abuse'.  Python does very little of that.  If it
becomes necessary, it's straightforward but long-winded to rewrite the
above in wholly portable C (peel the bytes out of ival,
least-signficant first, via shifting and masking 8 times; ival 
0xff is the least-significant byte regardless of memory storage
order; etc).  BTW, the IS_LITTLE_ENDIAN macro also relies on casting
abuse, and more deeply than does the visible cast there.
 
 It occurs that in the IEEE case, special values can be detected with
 reliablity -- by picking the exponent field out by force

Right, that works for NaNs and infinities; signed zeroes are a bit
trickier to detect.

 -- and a warning emitted or exception raised.  Good idea?  Hard to say, to me.

It's not possible to _create_ a NaN or infinity from finite operands
in 754 without signaling some exceptional condition.  Once you have
one, though, there's generally nothing exceptional about _using_ it. 
Sometimes there is, like +Inf - +Inf or Inf / Inf, but not generally. 
Using a quiet NaN never signals; using a signaling NaN almost always
signals.

So packing a nan or inf shouldn't complain.  On a 754 box, unpacking
one shouldn't complain either.  Unpacking a nan or inf on a non-754
box probably should complain, since there's in general nothing it can
be unpacked _to_ that makes any sense (errors should never pass
silently).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-12 Thread Michael Hudson
Tim Peters [EMAIL PROTECTED] writes:

 ...

 [mwh]
 I recall stories of machines that stored the bytes of long in some
 crazy order like that.  I think Python would already be broken on such
 a system, but, also, don't care.

 [Tim]
 Python does very little that depends on internal native byte order,
 and C hides it in the absence of casting abuse.

 [mwh]
 This surely does:

 PyObject *
 PyLong_FromLongLong(PY_LONG_LONG ival)
 {
PY_LONG_LONG bytes = ival;
int one = 1;
return _PyLong_FromByteArray(
(unsigned char *)bytes,
   SIZEOF_LONG_LONG, IS_LITTLE_ENDIAN, 1);
 }

 Yes, that's casting abuse'.  Python does very little of that.  If it
 becomes necessary, it's straightforward but long-winded to rewrite the
 above in wholly portable C (peel the bytes out of ival,
 least-signficant first, via shifting and masking 8 times; ival 
 0xff is the least-significant byte regardless of memory storage
 order; etc).

Not arguing with that.

 BTW, the IS_LITTLE_ENDIAN macro also relies on casting abuse, and
 more deeply than does the visible cast there.

I'd like to claim that was part of my point :)

There is a certain, small level of assumption in Python that
big-endian or little-endian is the only question to ask -- and I
don't think that's a problem!

Even in this isn't a big deal, at least if we choose a more
interesting 'probe value' that 1.5, it will just lead to an oddball
box degrading to the non-ieee code.

 It occurs that in the IEEE case, special values can be detected with
 reliablity -- by picking the exponent field out by force

 Right, that works for NaNs and infinities; signed zeroes are a bit
 trickier to detect.

Hmm.  Don't think they're such a big deal.

 -- and a warning emitted or exception raised.  Good idea?  Hard to
 say, to me.

 It's not possible to _create_ a NaN or infinity from finite operands
 in 754 without signaling some exceptional condition.  Once you have
 one, though, there's generally nothing exceptional about _using_ it. 
 Sometimes there is, like +Inf - +Inf or Inf / Inf, but not generally. 
 Using a quiet NaN never signals; using a signaling NaN almost always
 signals.

 So packing a nan or inf shouldn't complain.  On a 754 box, unpacking
 one shouldn't complain either.  Unpacking a nan or inf on a non-754
 box probably should complain, since there's in general nothing it can
 be unpacked _to_ that makes any sense (errors should never pass
 silently).

This sounds like good behaviour to me.  I'll try to update the patch
soon.

Cheers,
mwh

-- 
  BUGS   Never use this function.  This function modifies its first
 argument.   The  identity  of  the delimiting character is
 lost.  This function cannot be used on constant strings.
-- the glibc manpage for strtok(3)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Michael Hudson
Tim Peters [EMAIL PROTECTED] writes:

 The 754 standard doesn't say anything about how the difference between
 signaling and quiet NaNs is represented.  So it's possible that a qNaN
 on one box would look like an sNaN on a different box, and vice
 versa.  But since most people run with all FPU traps disabled, and
 Python doesn't expose a way to read the FPU status flags, they
 couldn't tell the difference.

OK.  Do you have any intuition as to whether 754 implementations
actually *do* differ on this point?

 Copying bytes works perfectly for all other cases (signed zeroes,
 non-zero finites, infinities), because their representations are
 wholly defined, although it's possible that a subnormal on one box
 will be treated like a zero (with the same sign) on a
 partially-conforming box.

I'd find struggling to care about that pretty hard.

 [1] I'm slighyly worried about oddball systems that do insane things
with the FPU by default -- but don't think the mooted change would
make things any worse.

 Sorry, don't know what that means.

Neither do I, now.  Oh well wink.

 The question, of course, is how to tell.

 Store a few small doubles at module initialization time and stare at

./configure time, surely?

 their bits.  That's enough to settle whether a 754 format is in use,
 and, if it is, whether it's big-endian or little-endian.

Do you have a pointer to code that does this?  Googling around the
subject appears to turn up lots of Python stuff...

 [2] Exaggeration, I realize -- but how many non 754 systems are out
there?  How many will see Python 2.5?

 No idea here.  The existing pack routines strive to do a good job of
 _creating_ an IEEE-754-format representation regardless of platform
 representation.  I assume that code would still be present, so
 oddball platforms would be left no worse off than they are now.

Well, yes, given the above.  The text this footnote was attached to
was asking if just assuming 754 float formats would inconvenience
anyone.

Cheers,
mwh

-- 
  I don't have any special knowledge of all this. In fact, I made all
  the above up, in the hope that it corresponds to reality.
-- Mark Carroll, ucam.chat
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Tim Peters
[Tim]
 The 754 standard doesn't say anything about how the difference between
 signaling and quiet NaNs is represented.  So it's possible that a qNaN
 on one box would look like an sNaN on a different box, and vice
 versa.  But since most people run with all FPU traps disabled, and
 Python doesn't expose a way to read the FPU status flags, they
 couldn't tell the difference.

[mwh]
 OK.  Do you have any intuition as to whether 754 implementations
 actually *do* differ on this point?

Not anymore -- hasn't been part of my job, or a hobby, for over a
decade.  There were differences a decade+ ago.  All NaNs have all
exponent bits set, and at least one mantissa bit set, and every bit
pattern of that form represents a NaN.  That's all the standard says. 
The most popular way to distinguish quiet from signaling NaNs keyed
off the most-significant mantissa bit:  set for a qNaN, clear for an
sNaN.  It's possible that all 754 HW does that now.

There's at least still that Pentium hardware adds a third not-a-number
possibility: in addition to 754's quiet and signaling NaNs, it also
has indeterminate values.  Here w/ native Windows Python 2.4 on a
Pentium:

 inf = 1e300 * 1e300
 inf - inf   # indeterminate
-1.#IND
 - _  # but the negation of IND is a quiet NaN
1.#QNAN


Do the same thing under Cygwin Python on the same box and it prints NaN twice.

Do people care about this?  I don't know.  It seems unlikely -- in
effect, IND just gives a special string name to a single one of the
many bit patterns that represent a quiet NaN.  OTOH, Pentium hardware
still preserves this distinction, and MS library docs do too.  IND
isn't part of the 754 standard (although, IIRC, it was part of a
pre-standard draft, which Intel implemented and is now stuck with).

 Copying bytes works perfectly for all other cases (signed zeroes,
 non-zero finites, infinities), because their representations are
 wholly defined, although it's possible that a subnormal on one box
 will be treated like a zero (with the same sign) on a
 partially-conforming box.

 I'd find struggling to care about that pretty hard.

Me too.

 The question, of course, is how to tell.

 Store a few small doubles at module initialization time and stare at

 ./configure time, surely?

Unsure.  Not all Python platforms _have_ ./configure time.  Module
initialization code is harder to screw up for that reason (the code is
in an obvious place then, self-contained, and doesn't require any
relevant knowledge of any platform porter unless/until it breaks).

 their bits.  That's enough to settle whether a 754 format is in use,
 and, if it is, whether it's big-endian or little-endian.

 Do you have a pointer to code that does this?

No.  Pemberton's enquire.c contains enough code to do it.  Given how
few distinct architectures still exist, it's probably enough to store
just double x = 1.5 and stare at it.

 [2] Exaggeration, I realize -- but how many non 754 systems are out
there?  How many will see Python 2.5?

 No idea here.  The existing pack routines strive to do a good job of
 _creating_ an IEEE-754-format representation regardless of platform
 representation.  I assume that code would still be present, so
 oddball platforms would be left no worse off than they are now.
 
 Well, yes, given the above.  The text this footnote was attached to
 was asking if just assuming 754 float formats would inconvenience
 anyone.

I think I'm still missing your intent here.  If you're asking whether
Python can blindly assume that 745 is in use, I'd say that's
undesirable but defensible if necessary.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Michael Hudson
Tim Peters [EMAIL PROTECTED] writes:

 [Tim]
 The 754 standard doesn't say anything about how the difference between
 signaling and quiet NaNs is represented.  So it's possible that a qNaN
 on one box would look like an sNaN on a different box, and vice
 versa.  But since most people run with all FPU traps disabled, and
 Python doesn't expose a way to read the FPU status flags, they
 couldn't tell the difference.

 [mwh]
 OK.  Do you have any intuition as to whether 754 implementations
 actually *do* differ on this point?

 Not anymore -- hasn't been part of my job, or a hobby, for over a
 decade.  There were differences a decade+ ago.  All NaNs have all
 exponent bits set, and at least one mantissa bit set, and every bit
 pattern of that form represents a NaN.  That's all the standard says. 
 The most popular way to distinguish quiet from signaling NaNs keyed
 off the most-significant mantissa bit:  set for a qNaN, clear for an
 sNaN.  It's possible that all 754 HW does that now.

[snip details]

OK, so the worst that could happen here is that moving marshal data
from one box to another could turn one sort of NaN into another?  This
doesn't seem very bad.

[denorms]

 I'd find struggling to care about that pretty hard.

 Me too.

Good.

 The question, of course, is how to tell.

 Store a few small doubles at module initialization time and stare at

 ./configure time, surely?

 Unsure.  Not all Python platforms _have_ ./configure time.  

But they all have pyconfig.h.

 Module initialization code is harder to screw up for that reason
 (the code is in an obvious place then, self-contained, and doesn't
 require any relevant knowledge of any platform porter unless/until
 it breaks).

Well, sure, but false negatives here are not a big deal here.

 their bits.  That's enough to settle whether a 754 format is in use,
 and, if it is, whether it's big-endian or little-endian.

 Do you have a pointer to code that does this?

 No.  Pemberton's enquire.c contains enough code to do it.  

Yikes!  And much else besides.

 Given how few distinct architectures still exist, it's probably
 enough to store just double x = 1.5 and stare at it.

Something along these lines:

double x = 1.5;
is_big_endian_ieee_double = sizeof(double) == 8  \
   memcmp((char*)x, \077\370\000\000\000\000\000\000, 8);

?

[me being obscure]
 I think I'm still missing your intent here.  If you're asking whether
 Python can blindly assume that 745 is in use, I'd say that's
 undesirable but defensible if necessary.

Yes, that's what I was asking, in a rather obscure way.

Cheers,
mwh

-- 
  Strangely enough  I saw just such a beast at  the grocery store
  last night. Starbucks sells Javachip. (It's ice cream, but that
  shouldn't be an obstacle for the Java marketing people.)
 -- Jeremy Hylton, 29 Apr 1997
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Tim Peters
...

[mwh]
 OK, so the worst that could happen here is that moving marshal data
 from one box to another could turn one sort of NaN into another?

Right.  Assuming source and destination boxes both use 754 format, and
the implementation adjusts endianess if necessary.

Heh.  I have a vague half-memory of _some_ box that stored the two
4-byte words in an IEEE double in one order, but the bytes within
each word in the opposite order.  It's always something ...

 This doesn't seem very bad.

Not bad at all:

But since most people run with all FPU traps disabled, and
Python doesn't expose a way to read the FPU status flags, they
couldn't tell the difference.

 Store a few small doubles at module initialization time and stare at

 ./configure time, surely?

 Unsure.  Not all Python platforms _have_ ./configure time.
 
 But they all have pyconfig.h.

Yes, and then a platform porter has to understand what to
#define/#undefine, and why.  People doing cross-compilation may have
an especially confusing time of it.  Module initialization code just
works, so I certainly understand why it doesn't appeal to the Unix
frame of mind wink.

 Module initialization code is harder to screw up for that reason
 (the code is in an obvious place then, self-contained, and doesn't
 require any relevant knowledge of any platform porter unless/until
 it breaks).

 Well, sure, but false negatives here are not a big deal here.

Sorry, unsure that false negative means here.

...

 Something along these lines:

 double x = 1.5;
 is_big_endian_ieee_double = sizeof(double) == 8  \
   memcmp((char*)x, \077\370\000\000\000\000\000\000, 8);

Right, it's that easy -- at least under MSVC and gcc.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Michael Hudson
I've just submitted http://python.org/sf/1180995 which adds format
codes for binary marshalling of floats if version  1, but it doesn't
quite have the effect I expected (see below):

 inf = 1e308*1e308
 nan = inf/inf
 marshal.dumps(nan, 2)
Traceback (most recent call last):
  File stdin, line 1, in ?
ValueError: unmarshallable object

frexp(nan, e), it turns out, returns nan, which results in this (to
be expected if you read _PyFloat_Pack8 and know that I'm using a
new-ish GCC -- it might be different for MSVC 6).

Also (this is the same thing, really):

 struct.pack('d', inf)
Traceback (most recent call last):
  File stdin, line 1, in ?
SystemError: frexp() result out of range

Although I was a little surprised by this:

 struct.pack('d', inf)
'\x7f\xf0\x00\x00\x00\x00\x00\x00'

(this is a big-endian system).  Again, reading the source explains the
behaviour.

Tim Peters [EMAIL PROTECTED] writes:

 ...

 [mwh]
 OK, so the worst that could happen here is that moving marshal data
 from one box to another could turn one sort of NaN into another?

 Right.  Assuming source and destination boxes both use 754 format, and
 the implementation adjusts endianess if necessary.

Well, I was assuming marshal would do floats little-endian-wise, as it
does for integers.

 Heh.  I have a vague half-memory of _some_ box that stored the two
 4-byte words in an IEEE double in one order, but the bytes within
 each word in the opposite order.  It's always something ...

I recall stories of machines that stored the bytes of long in some
crazy order like that.  I think Python would already be broken on such
a system, but, also, don't care.

 Store a few small doubles at module initialization time and stare at

 ./configure time, surely?

 Unsure.  Not all Python platforms _have_ ./configure time.
  
 But they all have pyconfig.h.

 Yes, and then a platform porter has to understand what to
 #define/#undefine, and why.  People doing cross-compilation may have
 an especially confusing time of it.

Well, they can always not #define HAVE_IEEE_DOUBLES and not suffer all
that much (this is what I meant by false negatives below).

 Module initialization code just works, so I certainly understand
 why it doesn't appeal to the Unix frame of mind wink.

It just strikes as silly to test at runtime sometime that is so
obviously not going to change between invocations.  But it's not a big
deal either way.

 ...

 Something along these lines:

 double x = 1.5;
 is_big_endian_ieee_double = sizeof(double) == 8  \
   memcmp((char*)x, \077\370\000\000\000\000\000\000, 8);

 Right, it's that easy

Cool.

 -- at least under MSVC and gcc.

Huh?  Now it's my turn to be confused (for starters, under MSVC ieee
doubles really can be assumed...).

Cheers,
mwh 

-- 
  You sound surprised.  We're talking about a government department
  here - they have procedures, not intelligence.
-- Ben Hutchings, cam.misc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-11 Thread Tim Peters
[Michael Hudson]
 I've just submitted http://python.org/sf/1180995 which adds format
 codes for binary marshalling of floats if version  1, but it doesn't
 quite have the effect I expected (see below):

  inf = 1e308*1e308
  nan = inf/inf
  marshal.dumps(nan, 2)
 Traceback (most recent call last):
  File stdin, line 1, in ?
 ValueError: unmarshallable object

I don't understand.  Does binary marshalling _not_ mean just copying
the bytes on a 754 platform?  If so, that won't work.  I pointed out
the relevant comments before:

/* The pack routines write 4 or 8 bytes, starting at p.
...
 * Bug:  What this does is undefined if x is a NaN or infinity.
 * Bug:  -0.0 and +0.0 produce the same string.
 */
PyAPI_FUNC(int) _PyFloat_Pack4(double x, unsigned char *p, int le);
PyAPI_FUNC(int) _PyFloat_Pack8(double x, unsigned char *p, int le);

 frexp(nan, e), it turns out, returns nan,

This is an undefined case in C89 (all 754 special values are).

 which results in this (to be expected if you read _PyFloat_Pack8 and
 know that I'm using a new-ish GCC -- it might be different for MSVC 6).

 Also (this is the same thing, really):

Right.  So is pickling with proto = 1.  Changing the pack/unpack
routines to copy bytes instead (when possible) fixes all of these
things at one stroke, on boxes where it applies.
 
  struct.pack('d', inf)
 Traceback (most recent call last):
  File stdin, line 1, in ?
 SystemError: frexp() result out of range

 Although I was a little surprised by this:

  struct.pack('d', inf)
 '\x7f\xf0\x00\x00\x00\x00\x00\x00'

 (this is a big-endian system).  Again, reading the source explains the
 behaviour.

 OK, so the worst that could happen here is that moving marshal data
 from one box to another could turn one sort of NaN into another?

 Right.  Assuming source and destination boxes both use 754 format, and
 the implementation adjusts endianess if necessary.

 Well, I was assuming marshal would do floats little-endian-wise, as it
 does for integers.

Then on a big-endian 754 system, loads() will have to reverse the
bytes in the little-endian marshal bytestring, and dumps() likewise. 
That's all if necessary meant -- sometimes cast + memcpy isn't
enough, and regardless of which direction marshal decides to use.

 Heh.  I have a vague half-memory of _some_ box that stored the two
 4-byte words in an IEEE double in one order, but the bytes within
 each word in the opposite order.  It's always something ...

 I recall stories of machines that stored the bytes of long in some
 crazy order like that.  I think Python would already be broken on such
 a system, but, also, don't care.

Python does very little that depends on internal native byte order,
and C hides it in the absence of casting abuse.  Copying internal
native bytes across boxes is plain ugly -- can't get more brittle than
that.  In this case it looks like a good tradeoff, though.

 ...
 Well, they can always not #define HAVE_IEEE_DOUBLES and not suffer all
 that much (this is what I meant by false negatives below).
 ...
 It just strikes as silly to test at runtime sometime that is so
 obviously not going to change between invocations.  But it's not a big
 deal either way.

It isn't to me either.  It just strikes me as silly to give porters
another thing to wonder about and screw up when it's possible to solve
it completely with a few measly runtime cycles wink.

 Something along these lines:

 double x = 1.5;
 is_big_endian_ieee_double = sizeof(double) == 8  \
   memcmp((char*)x, \077\370\000\000\000\000\000\000, 8);

 Right, it's that easy

 Cool.

 -- at least under MSVC and gcc.
 
 Huh?  Now it's my turn to be confused (for starters, under MSVC ieee
 doubles really can be assumed...).

So you have no argument with the at least under MSVC part wink. 
There's nothing to worry about here -- I was just tweaking.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-10 Thread Tim Peters
marshal shouldn't be representing doubles as decimal strings to begin
with.  All code for (de)serialing C doubles should go thru
_PyFloat_Pack8() and _PyFloat_Unpack8().  cPickle (proto = 1) and
struct (std mode) already do; marshal is the oddball.

But as the docs (floatobject.h) for these say:

...
 * Bug:  What this does is undefined if x is a NaN or infinity.
 * Bug:  -0.0 and +0.0 produce the same string.
 */
PyAPI_FUNC(int) _PyFloat_Pack4(double x, unsigned char *p, int le);
PyAPI_FUNC(int) _PyFloat_Pack8(double x, unsigned char *p, int le);
...
 * Bug:  What this does is undefined if the string represents a NaN or
 * infinity.
 */
PyAPI_FUNC(double) _PyFloat_Unpack4(const unsigned char *p, int le);
PyAPI_FUNC(double) _PyFloat_Unpack8(const unsigned char *p, int le);
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-10 Thread Michael Hudson
Tim Peters [EMAIL PROTECTED] writes:

 marshal shouldn't be representing doubles as decimal strings to begin
 with.  All code for (de)serialing C doubles should go thru
 _PyFloat_Pack8() and _PyFloat_Unpack8().  cPickle (proto = 1) and
 struct (std mode) already do; marshal is the oddball.

 But as the docs (floatobject.h) for these say:

 ...
  * Bug:  What this does is undefined if x is a NaN or infinity.
  * Bug:  -0.0 and +0.0 produce the same string.
  */
 PyAPI_FUNC(int) _PyFloat_Pack4(double x, unsigned char *p, int le);
 PyAPI_FUNC(int) _PyFloat_Pack8(double x, unsigned char *p, int le);
 ...
  * Bug:  What this does is undefined if the string represents a NaN or
  * infinity.
  */
 PyAPI_FUNC(double) _PyFloat_Unpack4(const unsigned char *p, int le);
 PyAPI_FUNC(double) _PyFloat_Unpack8(const unsigned char *p, int le);

OTOH, the implementation has this comment:

/*
 * _PyFloat_{Pack,Unpack}{4,8}.  See floatobject.h.
 *
 * TODO:  On platforms that use the standard IEEE-754 single and double
 * formats natively, these routines could simply copy the bytes.
 */

Doing that would fix these problems, surely?[1]

The question, of course, is how to tell.  I suppose one could jsut do
it unconditionally and wait for one of the three remaining VAX
users[2] to compile Python 2.5 and then notice.

More conservatively, one could just do this on Windows, linux/most
architectures and Mac OS X.

Cheers,
mwh

[1] I'm slighyly worried about oddball systems that do insane things
with the FPU by default -- but don't think the mooted change would
make things any worse.

[2] Exaggeration, I realize -- but how many non 754 systems are out
there?  How many will see Python 2.5?

-- 
  If you give someone Fortran, he has Fortran.
  If you give someone Lisp, he has any language he pleases.
-- Guy L. Steele Jr, quoted by David Rush in comp.lang.scheme.scsh
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-10 Thread Skip Montanaro

Michael I suppose one could jsut do it unconditionally and wait for one
Michael of the three remaining VAX users[2] to compile Python 2.5 and
Michael then notice.

You forgot the two remaining CRAY users.  Since their machines are so much
more powerful than VAXen, they have much more influence over Python
development. wink

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-10 Thread Tim Peters
[mwh]
 OTOH, the implementation has this comment:

 /*
 * _PyFloat_{Pack,Unpack}{4,8}.  See floatobject.h.
 *
 * TODO:  On platforms that use the standard IEEE-754 single and double
 * formats natively, these routines could simply copy the bytes.
 */
 
 Doing that would fix these problems, surely?[1]

The 754 standard doesn't say anything about how the difference between
signaling and quiet NaNs is represented.  So it's possible that a qNaN
on one box would look like an sNaN on a different box, and vice
versa.  But since most people run with all FPU traps disabled, and
Python doesn't expose a way to read the FPU status flags, they
couldn't tell the difference.

Copying bytes works perfectly for all other cases (signed zeroes,
non-zero finites, infinities), because their representations are
wholly defined, although it's possible that a subnormal on one box
will be treated like a zero (with the same sign) on a
partially-conforming box.

 [1] I'm slighyly worried about oddball systems that do insane things
with the FPU by default -- but don't think the mooted change would
make things any worse.

Sorry, don't know what that means.

 The question, of course, is how to tell.

Store a few small doubles at module initialization time and stare at
their bits.  That's enough to settle whether a 754 format is in use,
and, if it is, whether it's big-endian or little-endian.

...

 [2] Exaggeration, I realize -- but how many non 754 systems are out
there?  How many will see Python 2.5?

No idea here.  The existing pack routines strive to do a good job of
_creating_ an IEEE-754-format representation regardless of platform
representation.  I assume that code would still be present, so
oddball platforms would be left no worse off than they are now.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-10 Thread Alex Martelli
On Apr 10, 2005, at 13:44, Skip Montanaro wrote:
Michael I suppose one could jsut do it unconditionally and wait 
for one
Michael of the three remaining VAX users[2] to compile Python 2.5 
and
Michael then notice.

You forgot the two remaining CRAY users.  Since their machines are so 
much
more powerful than VAXen, they have much more influence over Python
development. wink
The latest ads I've seen from Cray were touting AMD-64 processors 
anyway...;-)

Alex
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-09 Thread Skip Montanaro

Martin Yet, this *still* is a platform dependence. Python makes no
Martin guarantee that 1e1000 is a supported float literal on any
Martin platform, and indeed, on your platform, 1e1000 is not supported
Martin on your platform.

Are float(inf) and float(nan) supported everywhere?  I don't have ready
access to a Windows machine, but on the couple Linux and MacOS machines
at-hand they are.  As a starting point can it be agreed on whether they
should be supported?  (There is a unique IEEE-754 representation for both
values, right?  Should we try and support any other floating point format?)
If so, the float(1e1) == float(inf) in all cases, right?  If not,
then Python's lexer should be trained to know what out-of-range floats are
and complain when it encounters them.  In either case, we should then know
how to fix marshal.loads (and probably pickle.loads).

That seems like it would be a start in the right direction.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: marshal / unmarshal

2005-04-09 Thread Martin v. Löwis
Skip Montanaro wrote:
 Martin Yet, this *still* is a platform dependence. Python makes no
 Martin guarantee that 1e1000 is a supported float literal on any
 Martin platform, and indeed, on your platform, 1e1000 is not supported
 Martin on your platform.
 
 Are float(inf) and float(nan) supported everywhere? 

I would not expect that, but Tim will correct me if I'm wrong.

 As a starting point can it be agreed on whether they
 should be supported?  (There is a unique IEEE-754 representation for both
 values, right?

Perhaps yes for inf, but I think maybe no for nan. There are multiple
IEEE-754 representations for NaN. However, I understand all NaN are
meant to compare unequal - even if they use the same representation.


 If so, the float(1e1) == float(inf) in all cases, right?

Currently, not necessarily: if a large-enough exponent is supported
(which might be the case with a IEEE long double, dunno), 1e1
would be a regular value.

 That seems like it would be a start in the right direction.

Pieces of it would be a start in the right direction.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com