Re: [Python-Dev] Re: marshal / unmarshal
On 14-apr-05, at 15:08, David Robinow wrote: On 4/11/05, Tim Peters [EMAIL PROTECTED] wrote: Heh. I have a vague half-memory of _some_ box that stored the two 4-byte words in an IEEE double in one order, but the bytes within each word in the opposite order. It's always something ... I believe this was the Floating Instruction Set on the PDP 11/35. The fact that it's still remembered 30 years later shows how unusual it was. I think it was actually logical, because all PDP-11s (there were 2 or 3 FPU instructionsets/architecture in the family IIRC) stored 32 bit integers in middle-endian (high-order word first, but low-order byte first). But note that neither of the PDP-11 FPUs were IEEE, that was a much later invention. At least, I didn't come across it until much later:-) -- Jack Jansen, [EMAIL PROTECTED], http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
My mail is experincing random delays of up to a few hours at the moment. I wrote this before I saw your comments on my patch. Tim Peters [EMAIL PROTECTED] writes: [Michael Hudson] I've just submitted http://python.org/sf/1180995 which adds format codes for binary marshalling of floats if version 1, but it doesn't quite have the effect I expected (see below): inf = 1e308*1e308 nan = inf/inf marshal.dumps(nan, 2) Traceback (most recent call last): File stdin, line 1, in ? ValueError: unmarshallable object I don't understand. Does binary marshalling _not_ mean just copying the bytes on a 754 platform? No, it means using _PyFloat_Pack8/Unpack8, like the patch description says. Making those functions just fiddle bytes when they can I regard as a separate project (watch a patch manager near you, though). If so, that won't work. I can tell! wink Right. Assuming source and destination boxes both use 754 format, and the implementation adjusts endianess if necessary. Well, I was assuming marshal would do floats little-endian-wise, as it does for integers. Then on a big-endian 754 system, loads() will have to reverse the bytes in the little-endian marshal bytestring, and dumps() likewise. Really? Even I had worked this out... Heh. I have a vague half-memory of _some_ box that stored the two 4-byte words in an IEEE double in one order, but the bytes within each word in the opposite order. It's always something ... I recall stories of machines that stored the bytes of long in some crazy order like that. I think Python would already be broken on such a system, but, also, don't care. Python does very little that depends on internal native byte order, and C hides it in the absence of casting abuse. This surely does: PyObject * PyLong_FromLongLong(PY_LONG_LONG ival) { PY_LONG_LONG bytes = ival; int one = 1; return _PyLong_FromByteArray( (unsigned char *)bytes, SIZEOF_LONG_LONG, IS_LITTLE_ENDIAN, 1); } It occurs that in the IEEE case, special values can be detected with reliablity -- by picking the exponent field out by force -- and a warning emitted or exception raised. Good idea? Hard to say, to me. Cheers, mwh Oh, by the way: http://python.org/sf/1181301 -- It is time-consuming to produce high-quality software. However, that should not alone be a reason to give up the high standards of Python development. -- Martin von Loewis, python-dev ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
... [mwh] I recall stories of machines that stored the bytes of long in some crazy order like that. I think Python would already be broken on such a system, but, also, don't care. [Tim] Python does very little that depends on internal native byte order, and C hides it in the absence of casting abuse. [mwh] This surely does: PyObject * PyLong_FromLongLong(PY_LONG_LONG ival) { PY_LONG_LONG bytes = ival; int one = 1; return _PyLong_FromByteArray( (unsigned char *)bytes, SIZEOF_LONG_LONG, IS_LITTLE_ENDIAN, 1); } Yes, that's casting abuse'. Python does very little of that. If it becomes necessary, it's straightforward but long-winded to rewrite the above in wholly portable C (peel the bytes out of ival, least-signficant first, via shifting and masking 8 times; ival 0xff is the least-significant byte regardless of memory storage order; etc). BTW, the IS_LITTLE_ENDIAN macro also relies on casting abuse, and more deeply than does the visible cast there. It occurs that in the IEEE case, special values can be detected with reliablity -- by picking the exponent field out by force Right, that works for NaNs and infinities; signed zeroes are a bit trickier to detect. -- and a warning emitted or exception raised. Good idea? Hard to say, to me. It's not possible to _create_ a NaN or infinity from finite operands in 754 without signaling some exceptional condition. Once you have one, though, there's generally nothing exceptional about _using_ it. Sometimes there is, like +Inf - +Inf or Inf / Inf, but not generally. Using a quiet NaN never signals; using a signaling NaN almost always signals. So packing a nan or inf shouldn't complain. On a 754 box, unpacking one shouldn't complain either. Unpacking a nan or inf on a non-754 box probably should complain, since there's in general nothing it can be unpacked _to_ that makes any sense (errors should never pass silently). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
Tim Peters [EMAIL PROTECTED] writes: ... [mwh] I recall stories of machines that stored the bytes of long in some crazy order like that. I think Python would already be broken on such a system, but, also, don't care. [Tim] Python does very little that depends on internal native byte order, and C hides it in the absence of casting abuse. [mwh] This surely does: PyObject * PyLong_FromLongLong(PY_LONG_LONG ival) { PY_LONG_LONG bytes = ival; int one = 1; return _PyLong_FromByteArray( (unsigned char *)bytes, SIZEOF_LONG_LONG, IS_LITTLE_ENDIAN, 1); } Yes, that's casting abuse'. Python does very little of that. If it becomes necessary, it's straightforward but long-winded to rewrite the above in wholly portable C (peel the bytes out of ival, least-signficant first, via shifting and masking 8 times; ival 0xff is the least-significant byte regardless of memory storage order; etc). Not arguing with that. BTW, the IS_LITTLE_ENDIAN macro also relies on casting abuse, and more deeply than does the visible cast there. I'd like to claim that was part of my point :) There is a certain, small level of assumption in Python that big-endian or little-endian is the only question to ask -- and I don't think that's a problem! Even in this isn't a big deal, at least if we choose a more interesting 'probe value' that 1.5, it will just lead to an oddball box degrading to the non-ieee code. It occurs that in the IEEE case, special values can be detected with reliablity -- by picking the exponent field out by force Right, that works for NaNs and infinities; signed zeroes are a bit trickier to detect. Hmm. Don't think they're such a big deal. -- and a warning emitted or exception raised. Good idea? Hard to say, to me. It's not possible to _create_ a NaN or infinity from finite operands in 754 without signaling some exceptional condition. Once you have one, though, there's generally nothing exceptional about _using_ it. Sometimes there is, like +Inf - +Inf or Inf / Inf, but not generally. Using a quiet NaN never signals; using a signaling NaN almost always signals. So packing a nan or inf shouldn't complain. On a 754 box, unpacking one shouldn't complain either. Unpacking a nan or inf on a non-754 box probably should complain, since there's in general nothing it can be unpacked _to_ that makes any sense (errors should never pass silently). This sounds like good behaviour to me. I'll try to update the patch soon. Cheers, mwh -- BUGS Never use this function. This function modifies its first argument. The identity of the delimiting character is lost. This function cannot be used on constant strings. -- the glibc manpage for strtok(3) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
Tim Peters [EMAIL PROTECTED] writes: The 754 standard doesn't say anything about how the difference between signaling and quiet NaNs is represented. So it's possible that a qNaN on one box would look like an sNaN on a different box, and vice versa. But since most people run with all FPU traps disabled, and Python doesn't expose a way to read the FPU status flags, they couldn't tell the difference. OK. Do you have any intuition as to whether 754 implementations actually *do* differ on this point? Copying bytes works perfectly for all other cases (signed zeroes, non-zero finites, infinities), because their representations are wholly defined, although it's possible that a subnormal on one box will be treated like a zero (with the same sign) on a partially-conforming box. I'd find struggling to care about that pretty hard. [1] I'm slighyly worried about oddball systems that do insane things with the FPU by default -- but don't think the mooted change would make things any worse. Sorry, don't know what that means. Neither do I, now. Oh well wink. The question, of course, is how to tell. Store a few small doubles at module initialization time and stare at ./configure time, surely? their bits. That's enough to settle whether a 754 format is in use, and, if it is, whether it's big-endian or little-endian. Do you have a pointer to code that does this? Googling around the subject appears to turn up lots of Python stuff... [2] Exaggeration, I realize -- but how many non 754 systems are out there? How many will see Python 2.5? No idea here. The existing pack routines strive to do a good job of _creating_ an IEEE-754-format representation regardless of platform representation. I assume that code would still be present, so oddball platforms would be left no worse off than they are now. Well, yes, given the above. The text this footnote was attached to was asking if just assuming 754 float formats would inconvenience anyone. Cheers, mwh -- I don't have any special knowledge of all this. In fact, I made all the above up, in the hope that it corresponds to reality. -- Mark Carroll, ucam.chat ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
[Tim] The 754 standard doesn't say anything about how the difference between signaling and quiet NaNs is represented. So it's possible that a qNaN on one box would look like an sNaN on a different box, and vice versa. But since most people run with all FPU traps disabled, and Python doesn't expose a way to read the FPU status flags, they couldn't tell the difference. [mwh] OK. Do you have any intuition as to whether 754 implementations actually *do* differ on this point? Not anymore -- hasn't been part of my job, or a hobby, for over a decade. There were differences a decade+ ago. All NaNs have all exponent bits set, and at least one mantissa bit set, and every bit pattern of that form represents a NaN. That's all the standard says. The most popular way to distinguish quiet from signaling NaNs keyed off the most-significant mantissa bit: set for a qNaN, clear for an sNaN. It's possible that all 754 HW does that now. There's at least still that Pentium hardware adds a third not-a-number possibility: in addition to 754's quiet and signaling NaNs, it also has indeterminate values. Here w/ native Windows Python 2.4 on a Pentium: inf = 1e300 * 1e300 inf - inf # indeterminate -1.#IND - _ # but the negation of IND is a quiet NaN 1.#QNAN Do the same thing under Cygwin Python on the same box and it prints NaN twice. Do people care about this? I don't know. It seems unlikely -- in effect, IND just gives a special string name to a single one of the many bit patterns that represent a quiet NaN. OTOH, Pentium hardware still preserves this distinction, and MS library docs do too. IND isn't part of the 754 standard (although, IIRC, it was part of a pre-standard draft, which Intel implemented and is now stuck with). Copying bytes works perfectly for all other cases (signed zeroes, non-zero finites, infinities), because their representations are wholly defined, although it's possible that a subnormal on one box will be treated like a zero (with the same sign) on a partially-conforming box. I'd find struggling to care about that pretty hard. Me too. The question, of course, is how to tell. Store a few small doubles at module initialization time and stare at ./configure time, surely? Unsure. Not all Python platforms _have_ ./configure time. Module initialization code is harder to screw up for that reason (the code is in an obvious place then, self-contained, and doesn't require any relevant knowledge of any platform porter unless/until it breaks). their bits. That's enough to settle whether a 754 format is in use, and, if it is, whether it's big-endian or little-endian. Do you have a pointer to code that does this? No. Pemberton's enquire.c contains enough code to do it. Given how few distinct architectures still exist, it's probably enough to store just double x = 1.5 and stare at it. [2] Exaggeration, I realize -- but how many non 754 systems are out there? How many will see Python 2.5? No idea here. The existing pack routines strive to do a good job of _creating_ an IEEE-754-format representation regardless of platform representation. I assume that code would still be present, so oddball platforms would be left no worse off than they are now. Well, yes, given the above. The text this footnote was attached to was asking if just assuming 754 float formats would inconvenience anyone. I think I'm still missing your intent here. If you're asking whether Python can blindly assume that 745 is in use, I'd say that's undesirable but defensible if necessary. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
Tim Peters [EMAIL PROTECTED] writes: [Tim] The 754 standard doesn't say anything about how the difference between signaling and quiet NaNs is represented. So it's possible that a qNaN on one box would look like an sNaN on a different box, and vice versa. But since most people run with all FPU traps disabled, and Python doesn't expose a way to read the FPU status flags, they couldn't tell the difference. [mwh] OK. Do you have any intuition as to whether 754 implementations actually *do* differ on this point? Not anymore -- hasn't been part of my job, or a hobby, for over a decade. There were differences a decade+ ago. All NaNs have all exponent bits set, and at least one mantissa bit set, and every bit pattern of that form represents a NaN. That's all the standard says. The most popular way to distinguish quiet from signaling NaNs keyed off the most-significant mantissa bit: set for a qNaN, clear for an sNaN. It's possible that all 754 HW does that now. [snip details] OK, so the worst that could happen here is that moving marshal data from one box to another could turn one sort of NaN into another? This doesn't seem very bad. [denorms] I'd find struggling to care about that pretty hard. Me too. Good. The question, of course, is how to tell. Store a few small doubles at module initialization time and stare at ./configure time, surely? Unsure. Not all Python platforms _have_ ./configure time. But they all have pyconfig.h. Module initialization code is harder to screw up for that reason (the code is in an obvious place then, self-contained, and doesn't require any relevant knowledge of any platform porter unless/until it breaks). Well, sure, but false negatives here are not a big deal here. their bits. That's enough to settle whether a 754 format is in use, and, if it is, whether it's big-endian or little-endian. Do you have a pointer to code that does this? No. Pemberton's enquire.c contains enough code to do it. Yikes! And much else besides. Given how few distinct architectures still exist, it's probably enough to store just double x = 1.5 and stare at it. Something along these lines: double x = 1.5; is_big_endian_ieee_double = sizeof(double) == 8 \ memcmp((char*)x, \077\370\000\000\000\000\000\000, 8); ? [me being obscure] I think I'm still missing your intent here. If you're asking whether Python can blindly assume that 745 is in use, I'd say that's undesirable but defensible if necessary. Yes, that's what I was asking, in a rather obscure way. Cheers, mwh -- Strangely enough I saw just such a beast at the grocery store last night. Starbucks sells Javachip. (It's ice cream, but that shouldn't be an obstacle for the Java marketing people.) -- Jeremy Hylton, 29 Apr 1997 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
... [mwh] OK, so the worst that could happen here is that moving marshal data from one box to another could turn one sort of NaN into another? Right. Assuming source and destination boxes both use 754 format, and the implementation adjusts endianess if necessary. Heh. I have a vague half-memory of _some_ box that stored the two 4-byte words in an IEEE double in one order, but the bytes within each word in the opposite order. It's always something ... This doesn't seem very bad. Not bad at all: But since most people run with all FPU traps disabled, and Python doesn't expose a way to read the FPU status flags, they couldn't tell the difference. Store a few small doubles at module initialization time and stare at ./configure time, surely? Unsure. Not all Python platforms _have_ ./configure time. But they all have pyconfig.h. Yes, and then a platform porter has to understand what to #define/#undefine, and why. People doing cross-compilation may have an especially confusing time of it. Module initialization code just works, so I certainly understand why it doesn't appeal to the Unix frame of mind wink. Module initialization code is harder to screw up for that reason (the code is in an obvious place then, self-contained, and doesn't require any relevant knowledge of any platform porter unless/until it breaks). Well, sure, but false negatives here are not a big deal here. Sorry, unsure that false negative means here. ... Something along these lines: double x = 1.5; is_big_endian_ieee_double = sizeof(double) == 8 \ memcmp((char*)x, \077\370\000\000\000\000\000\000, 8); Right, it's that easy -- at least under MSVC and gcc. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
I've just submitted http://python.org/sf/1180995 which adds format codes for binary marshalling of floats if version 1, but it doesn't quite have the effect I expected (see below): inf = 1e308*1e308 nan = inf/inf marshal.dumps(nan, 2) Traceback (most recent call last): File stdin, line 1, in ? ValueError: unmarshallable object frexp(nan, e), it turns out, returns nan, which results in this (to be expected if you read _PyFloat_Pack8 and know that I'm using a new-ish GCC -- it might be different for MSVC 6). Also (this is the same thing, really): struct.pack('d', inf) Traceback (most recent call last): File stdin, line 1, in ? SystemError: frexp() result out of range Although I was a little surprised by this: struct.pack('d', inf) '\x7f\xf0\x00\x00\x00\x00\x00\x00' (this is a big-endian system). Again, reading the source explains the behaviour. Tim Peters [EMAIL PROTECTED] writes: ... [mwh] OK, so the worst that could happen here is that moving marshal data from one box to another could turn one sort of NaN into another? Right. Assuming source and destination boxes both use 754 format, and the implementation adjusts endianess if necessary. Well, I was assuming marshal would do floats little-endian-wise, as it does for integers. Heh. I have a vague half-memory of _some_ box that stored the two 4-byte words in an IEEE double in one order, but the bytes within each word in the opposite order. It's always something ... I recall stories of machines that stored the bytes of long in some crazy order like that. I think Python would already be broken on such a system, but, also, don't care. Store a few small doubles at module initialization time and stare at ./configure time, surely? Unsure. Not all Python platforms _have_ ./configure time. But they all have pyconfig.h. Yes, and then a platform porter has to understand what to #define/#undefine, and why. People doing cross-compilation may have an especially confusing time of it. Well, they can always not #define HAVE_IEEE_DOUBLES and not suffer all that much (this is what I meant by false negatives below). Module initialization code just works, so I certainly understand why it doesn't appeal to the Unix frame of mind wink. It just strikes as silly to test at runtime sometime that is so obviously not going to change between invocations. But it's not a big deal either way. ... Something along these lines: double x = 1.5; is_big_endian_ieee_double = sizeof(double) == 8 \ memcmp((char*)x, \077\370\000\000\000\000\000\000, 8); Right, it's that easy Cool. -- at least under MSVC and gcc. Huh? Now it's my turn to be confused (for starters, under MSVC ieee doubles really can be assumed...). Cheers, mwh -- You sound surprised. We're talking about a government department here - they have procedures, not intelligence. -- Ben Hutchings, cam.misc ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
[Michael Hudson] I've just submitted http://python.org/sf/1180995 which adds format codes for binary marshalling of floats if version 1, but it doesn't quite have the effect I expected (see below): inf = 1e308*1e308 nan = inf/inf marshal.dumps(nan, 2) Traceback (most recent call last): File stdin, line 1, in ? ValueError: unmarshallable object I don't understand. Does binary marshalling _not_ mean just copying the bytes on a 754 platform? If so, that won't work. I pointed out the relevant comments before: /* The pack routines write 4 or 8 bytes, starting at p. ... * Bug: What this does is undefined if x is a NaN or infinity. * Bug: -0.0 and +0.0 produce the same string. */ PyAPI_FUNC(int) _PyFloat_Pack4(double x, unsigned char *p, int le); PyAPI_FUNC(int) _PyFloat_Pack8(double x, unsigned char *p, int le); frexp(nan, e), it turns out, returns nan, This is an undefined case in C89 (all 754 special values are). which results in this (to be expected if you read _PyFloat_Pack8 and know that I'm using a new-ish GCC -- it might be different for MSVC 6). Also (this is the same thing, really): Right. So is pickling with proto = 1. Changing the pack/unpack routines to copy bytes instead (when possible) fixes all of these things at one stroke, on boxes where it applies. struct.pack('d', inf) Traceback (most recent call last): File stdin, line 1, in ? SystemError: frexp() result out of range Although I was a little surprised by this: struct.pack('d', inf) '\x7f\xf0\x00\x00\x00\x00\x00\x00' (this is a big-endian system). Again, reading the source explains the behaviour. OK, so the worst that could happen here is that moving marshal data from one box to another could turn one sort of NaN into another? Right. Assuming source and destination boxes both use 754 format, and the implementation adjusts endianess if necessary. Well, I was assuming marshal would do floats little-endian-wise, as it does for integers. Then on a big-endian 754 system, loads() will have to reverse the bytes in the little-endian marshal bytestring, and dumps() likewise. That's all if necessary meant -- sometimes cast + memcpy isn't enough, and regardless of which direction marshal decides to use. Heh. I have a vague half-memory of _some_ box that stored the two 4-byte words in an IEEE double in one order, but the bytes within each word in the opposite order. It's always something ... I recall stories of machines that stored the bytes of long in some crazy order like that. I think Python would already be broken on such a system, but, also, don't care. Python does very little that depends on internal native byte order, and C hides it in the absence of casting abuse. Copying internal native bytes across boxes is plain ugly -- can't get more brittle than that. In this case it looks like a good tradeoff, though. ... Well, they can always not #define HAVE_IEEE_DOUBLES and not suffer all that much (this is what I meant by false negatives below). ... It just strikes as silly to test at runtime sometime that is so obviously not going to change between invocations. But it's not a big deal either way. It isn't to me either. It just strikes me as silly to give porters another thing to wonder about and screw up when it's possible to solve it completely with a few measly runtime cycles wink. Something along these lines: double x = 1.5; is_big_endian_ieee_double = sizeof(double) == 8 \ memcmp((char*)x, \077\370\000\000\000\000\000\000, 8); Right, it's that easy Cool. -- at least under MSVC and gcc. Huh? Now it's my turn to be confused (for starters, under MSVC ieee doubles really can be assumed...). So you have no argument with the at least under MSVC part wink. There's nothing to worry about here -- I was just tweaking. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
marshal shouldn't be representing doubles as decimal strings to begin with. All code for (de)serialing C doubles should go thru _PyFloat_Pack8() and _PyFloat_Unpack8(). cPickle (proto = 1) and struct (std mode) already do; marshal is the oddball. But as the docs (floatobject.h) for these say: ... * Bug: What this does is undefined if x is a NaN or infinity. * Bug: -0.0 and +0.0 produce the same string. */ PyAPI_FUNC(int) _PyFloat_Pack4(double x, unsigned char *p, int le); PyAPI_FUNC(int) _PyFloat_Pack8(double x, unsigned char *p, int le); ... * Bug: What this does is undefined if the string represents a NaN or * infinity. */ PyAPI_FUNC(double) _PyFloat_Unpack4(const unsigned char *p, int le); PyAPI_FUNC(double) _PyFloat_Unpack8(const unsigned char *p, int le); ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
Tim Peters [EMAIL PROTECTED] writes: marshal shouldn't be representing doubles as decimal strings to begin with. All code for (de)serialing C doubles should go thru _PyFloat_Pack8() and _PyFloat_Unpack8(). cPickle (proto = 1) and struct (std mode) already do; marshal is the oddball. But as the docs (floatobject.h) for these say: ... * Bug: What this does is undefined if x is a NaN or infinity. * Bug: -0.0 and +0.0 produce the same string. */ PyAPI_FUNC(int) _PyFloat_Pack4(double x, unsigned char *p, int le); PyAPI_FUNC(int) _PyFloat_Pack8(double x, unsigned char *p, int le); ... * Bug: What this does is undefined if the string represents a NaN or * infinity. */ PyAPI_FUNC(double) _PyFloat_Unpack4(const unsigned char *p, int le); PyAPI_FUNC(double) _PyFloat_Unpack8(const unsigned char *p, int le); OTOH, the implementation has this comment: /* * _PyFloat_{Pack,Unpack}{4,8}. See floatobject.h. * * TODO: On platforms that use the standard IEEE-754 single and double * formats natively, these routines could simply copy the bytes. */ Doing that would fix these problems, surely?[1] The question, of course, is how to tell. I suppose one could jsut do it unconditionally and wait for one of the three remaining VAX users[2] to compile Python 2.5 and then notice. More conservatively, one could just do this on Windows, linux/most architectures and Mac OS X. Cheers, mwh [1] I'm slighyly worried about oddball systems that do insane things with the FPU by default -- but don't think the mooted change would make things any worse. [2] Exaggeration, I realize -- but how many non 754 systems are out there? How many will see Python 2.5? -- If you give someone Fortran, he has Fortran. If you give someone Lisp, he has any language he pleases. -- Guy L. Steele Jr, quoted by David Rush in comp.lang.scheme.scsh ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
Michael I suppose one could jsut do it unconditionally and wait for one Michael of the three remaining VAX users[2] to compile Python 2.5 and Michael then notice. You forgot the two remaining CRAY users. Since their machines are so much more powerful than VAXen, they have much more influence over Python development. wink Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
[mwh] OTOH, the implementation has this comment: /* * _PyFloat_{Pack,Unpack}{4,8}. See floatobject.h. * * TODO: On platforms that use the standard IEEE-754 single and double * formats natively, these routines could simply copy the bytes. */ Doing that would fix these problems, surely?[1] The 754 standard doesn't say anything about how the difference between signaling and quiet NaNs is represented. So it's possible that a qNaN on one box would look like an sNaN on a different box, and vice versa. But since most people run with all FPU traps disabled, and Python doesn't expose a way to read the FPU status flags, they couldn't tell the difference. Copying bytes works perfectly for all other cases (signed zeroes, non-zero finites, infinities), because their representations are wholly defined, although it's possible that a subnormal on one box will be treated like a zero (with the same sign) on a partially-conforming box. [1] I'm slighyly worried about oddball systems that do insane things with the FPU by default -- but don't think the mooted change would make things any worse. Sorry, don't know what that means. The question, of course, is how to tell. Store a few small doubles at module initialization time and stare at their bits. That's enough to settle whether a 754 format is in use, and, if it is, whether it's big-endian or little-endian. ... [2] Exaggeration, I realize -- but how many non 754 systems are out there? How many will see Python 2.5? No idea here. The existing pack routines strive to do a good job of _creating_ an IEEE-754-format representation regardless of platform representation. I assume that code would still be present, so oddball platforms would be left no worse off than they are now. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
On Apr 10, 2005, at 13:44, Skip Montanaro wrote: Michael I suppose one could jsut do it unconditionally and wait for one Michael of the three remaining VAX users[2] to compile Python 2.5 and Michael then notice. You forgot the two remaining CRAY users. Since their machines are so much more powerful than VAXen, they have much more influence over Python development. wink The latest ads I've seen from Cray were touting AMD-64 processors anyway...;-) Alex ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
Martin Yet, this *still* is a platform dependence. Python makes no Martin guarantee that 1e1000 is a supported float literal on any Martin platform, and indeed, on your platform, 1e1000 is not supported Martin on your platform. Are float(inf) and float(nan) supported everywhere? I don't have ready access to a Windows machine, but on the couple Linux and MacOS machines at-hand they are. As a starting point can it be agreed on whether they should be supported? (There is a unique IEEE-754 representation for both values, right? Should we try and support any other floating point format?) If so, the float(1e1) == float(inf) in all cases, right? If not, then Python's lexer should be trained to know what out-of-range floats are and complain when it encounters them. In either case, we should then know how to fix marshal.loads (and probably pickle.loads). That seems like it would be a start in the right direction. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
Skip Montanaro wrote: Martin Yet, this *still* is a platform dependence. Python makes no Martin guarantee that 1e1000 is a supported float literal on any Martin platform, and indeed, on your platform, 1e1000 is not supported Martin on your platform. Are float(inf) and float(nan) supported everywhere? I would not expect that, but Tim will correct me if I'm wrong. As a starting point can it be agreed on whether they should be supported? (There is a unique IEEE-754 representation for both values, right? Perhaps yes for inf, but I think maybe no for nan. There are multiple IEEE-754 representations for NaN. However, I understand all NaN are meant to compare unequal - even if they use the same representation. If so, the float(1e1) == float(inf) in all cases, right? Currently, not necessarily: if a large-enough exponent is supported (which might be the case with a IEEE long double, dunno), 1e1 would be a regular value. That seems like it would be a start in the right direction. Pieces of it would be a start in the right direction. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com