Executive summary (details and discussion points below) ================= Some time ago, Noam Raphael pointed out that for a float x, repr(x) can often be much shorter than it currently is, without sacrificing the property that eval(repr(x)) == x, and proposed changing Python accordingly. See
http://bugs.python.org/issue1580 For example, instead of the current behaviour: Python 3.1a2+ (py3k:71353:71354, Apr 7 2009, 12:55:16) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> 0.01 0.01 >>> 0.02 0.02 >>> 0.03 0.029999999999999999 >>> 0.04 0.040000000000000001 >>> 0.04 == eval(repr(0.04)) True we'd have this: Python 3.1a2+ (py3k-short-float-repr:71350:71352M, Apr 7 2009, ) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> 0.01 0.01 >>> 0.02 0.02 >>> 0.03 0.03 >>> 0.04 0.04 >>> 0.04 == eval(repr(0.04)) True Initial attempts to implement this encountered various difficulties, and at some point Tim Peters pointed out (I'm paraphrasing horribly here) that one can't have all three of {fast, easy, correct}. One PyCon 2009 sprint later, Eric Smith and I have produced the py3k-short-float-repr branch, which implements short repr of floats and also does some major cleaning up of the current float formatting functions. We've gone for the {fast, correct} pairing. We'd like to get this into Python 3.1. Any thoughts/objections/counter-proposals/...? More details ============ Our solution is based on an adaptation of David Gay's 'perfect rounding' code for inclusion in Python. To make eval(repr(x)) roundtripping work, one needs to have correctly rounded float -> decimal *and* decimal -> float conversions: Gay's code provides correctly rounded dtoa and strtod functions for these two conversions. His code is well-known and well-tested: it's used as the basis of the glibc strtod, and is also in OS X. It's available from http://www.netlib.org/fp/dtoa.c So our branch contains a new file Python/dtoa.c, which is a cut down version of Gay's original file. (We've removed stuff for VAX and IBM floating-point formats, hex NaNs, hex floating-point formats, locale-aware interpretation of the decimal separator, K&R headers, code for correct setting of the inexact flag, and various other bits and pieces that Python doesn't care about.) Most of the rest of the work is in the existing file Python/pystrtod.c. Every float -> string or string -> float conversion goes through a function in this file at some point. Gay's code also provides the opportunity to clean up the current float formatting code, and Eric has reworked a lot of the float formatting in the py3k-short-float-repr branch. This reworking should make finishing off the implementation of things like thousands separators much more straightforward. One example of this: the previous string -> float conversion used the system strtod, which is locale-aware, so the code had to first replace the '.' by the current locale's decimal separator, *then* call strtod. There was a similar dance in the reverse direction when doing float -> string conversion. Both these are now unnecessary. The current code is pretty close to ready for merging to py3k. I've uploaded a patchset to Rietveld: http://codereview.appspot.com/33084/show Apart from the short float repr, and a couple of bugfixes, all behaviour should be unchanged from before. There are a few exceptions: - format(1e200, '<') doesn't behave quite as it did before. See item (3) below for details - repr switches to using exponential notation at 1e16 instead of the previous 1e17. This avoids a subtle issue where the 'short float repr' result is padded with bogus zeros. - a similar change applies to str, which switches to exponential notation at 1e11, not 1e12. This fixes the following minor annoyance, which goes back at least as far as Python 2.5 (and probably much further): >>> x = 1e11 + 0.5 >>> x 100000000000.5 >>> print(x) 100000000000.0 That .0 seems wrong to me: if we're going to go to the trouble of printing extra digits (str usually only gives 12 significant digits; here there are 13), they should be the *right* extra digits. Discussion points ================= (1) Any objections to including this into py3k? If there's controversy, then I guess we'll need a PEP. (2) Should other Python implementations (Jython, IronPython, etc.) be expected to use short float repr, or should it just be considered an implementation detail of CPython? I propose the latter, except that all implementations should be required to satisfy eval(repr(x)) == x for finite floats x. (3) There's a PEP 3101 line we don't know what to do with. In py3k, we currently have: >>> format(1e200, '<') '1.0e+200' but in our py3k-short-float-repr branch: >>> format(1e200, '<') '1e+200' Which is correct? The py3k behaviour comes from the 'Standard Format Specifiers' section of PEP 3101, where it says: """ The available floating point presentation types are: [... list of other format codes omitted here ...] '' (None) - similar to 'g', except that it prints at least one digit after the decimal point. """ It's that 'at least one digit after the decimal point' bit that's at issue. I understood this to apply only to floats converted to a string *without* an exponent; this is the way that repr and str work, adding a .0 to floats formatted without an exponent, but leaving the .0 out when the exponent is present. Should the .0 always be added? Or is it required only when it would be necessary to distinguish a float string from an integer string? My preference is for the latter (i.e., format(x, '<') should behave in the same way as repr and str in this respect). But I'm biased, not least because the other behaviour would be a pain to implement. Does anyone care? This email is already too long. I'll stop now. Mark _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com