Thanks Paul, I wrapped this into the docs: http://git-wip-us.apache.org/repos/asf/couchdb/commit/bbd93f77
and on the way wrote a guide on how to contribute to the docs for the rest of you: http://git-wip-us.apache.org/repos/asf/couchdb/commit/1f5695dd Please make plenty of use of this! :) Best Jan -- On Feb 20, 2013, at 04:00 , Paul Davis <[email protected]> wrote: > Apologies for not being able to express myself earlier this morning. > I'd been without sleep for entirely too long. > > Robert Newson nails this on the head. The issue here succinctly stated is > such: > > Any numbers defined in JSON that contain a decimal point or exponent > will be passed through the Erlang VM's idea of the "double" data type. > Any numbers that are used in views will pass through the views idea of > a number (the common JavaScript case means even integers pass through > a double due to JavaScript's definition of a number). > > (This is roughly a "no matter what" proposition until we decide to > massively overhaul a significant portion of CouchDB internals to not > interpret JSON into an internal representation which is not impossible > but not likely for quite some time). > > What people are discussing in this particular thread is how we encode > those numbers after they have been passed through some internal > representation. While it can be a bit surprising and a number of > people have said "but couchdb changes my data!" its really not true > (with a caveat). What happens is CouchDB is changing the textual > representation of the result of decoding what it was given into some > numerical format. In most cases this is an IEEE-754 double precision > floating point number which is exactly what almost all other languages > use as well. > > What CouchDB does a bit differently than other languages is that it > does not attempt to pretty print the resulting output to use the > shortest number of characters. For instance, this is why we have this > relationship: > >> ejson:encode(ejson:decode(<<"1.1">>)). > <<"1.1000000000000000888">> > > What people are missing here is that internally those two formats > decode into the same IEEE-754 representation. And more importantly, it > will decode into a fairly close representation when passed through all > major parsers that I know about. > > While we've only been discussing cases where the textual > representation changes another important case is when an input value > is contains more precision than can actually represented in a double. > (You could argue that this case is actually "losing" data if you don't > accept that numbers are stored in doubles). > > Here's a log for a couple of the more common JSON libraries I happen > to have on my machine: > > Spidermonkey > > $ js -h 2>&1 | head -n 1 > JavaScript-C 1.8.5 2011-03-31 > $ js > js> JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890")) > "1.0123456789012346" > js> var f = > JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890")) > js> JSON.stringify(JSON.parse(f)) > "1.0123456789012346" > > Node > > $ node -v > v0.6.15 > $ node >> JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890")) > '1.0123456789012346' >> var f = >> JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890")) > undefined >> JSON.stringify(JSON.parse(f)) > '1.0123456789012346' > > $ python > Python 2.7.2 (default, Jun 20 2012, 16:23:33) > [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> import json >>>> json.dumps(json.loads("1.01234567890123456789012345678901234567890")) > '1.0123456789012346' >>>> f = json.dumps(json.loads("1.01234567890123456789012345678901234567890")) >>>> json.dumps(json.loads(f)) > '1.0123456789012346' > > Ruby > > An small aside on Ruby, it requires a top level object or array, so I just > wrapped the value. Should be obvious it doesn't affect the result of > parsing the number though. > > $ irb --version > irb 0.9.5(05/04/13) >>> require 'JSON' > => true >>> JSON.dump(JSON.load("[1.01234567890123456789012345678901234567890]")) > => "[1.01234567890123]" >>> f = JSON.dump(JSON.load("[1.01234567890123456789012345678901234567890]")) > => "[1.01234567890123]" >>> JSON.dump(JSON.load(f)) > => "[1.01234567890123]" > > > # ejson (CouchDB's current parser) at CouchDB sha 168a663b > > $ ./utils/run -i > Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] > [async-threads:4] [hipe] [kernel-poll:true] > > Eshell V5.8.5 (abort with ^G) > 1> > ejson:encode(ejson:decode(<<"1.01234567890123456789012345678901234567890">>)). > <<"1.0123456789012346135">> > 2> F = > ejson:encode(ejson:decode(<<"1.01234567890123456789012345678901234567890">>)). > <<"1.0123456789012346135">> > 3> ejson:encode(ejson:decode(F)). > <<"1.0123456789012346135">> > > > As you can see they all pretty much behave the same except for Ruby > actually does appear to be losing some precision over the other > libraries. > > The astute observer will notice that ejson (the CouchDB JSON library) > reported an extra three digits. While its tempting to think that this > is due to some internal difference, its just a more specific case of > the 1.1 input as described above. > > The important point to realize here is that a double can only hold a > finite number of values. What we're doing here is generating a string > that when passed through the "standard" floating point parsing > algorithms (ie, strtod) will result in the same bit pattern in memory > as we started with. Or, slightly different, the bytes in a JSON > serialized number are chosen such that they refer to a single specific > value that a double can represent. > > The game that other JSON libraries are playing is merely: > > "How few characters do I have to use to select this specific value for a > double" > > And that game has lots and lots of subtle details that are difficult > to duplicate in C without a significant amount of effort (it took > Python over a year to get it sorted with their fancy build systems > that automatically run on a number of different architectures). > > Hopefully I've shown that CouchDB is not doing anything "funky" by > changing input. Its behaving the same as any other common JSON library > does, its just not pretty printing its output. > > On the other hand, if you actually are in a position where an IEEE-754 > double is not a satisfactory datatype for your numbers, then the > answer as has been stated is to not pass your numbers through this > representation. In JSON this is accomplished by encoding them as a > string or by using integer types (although integer types can still > bite you if you use a platform that has a different integer > representation than normal, ie, JavaScript). > > Also, if anyone is really interested in changing this behavior, I'm > all ears for contributions to jiffy (which is theoretically going to > replace ejson when I get around to updating the build system). The > places I've looked for inspiration are TCL and Python. If you know a > decent implementation of this float printing algorithm give me a > holler. > > On Tue, Feb 19, 2013 at 3:58 PM, Tibor Gemes <[email protected]> wrote: >> It's against best practices to use floats for representing money. If you >> count pennies, then interpret the amount in pennies with int. >> If this inconsistency is unacceptable, then you must use int. You should >> use float only if this does not matter. >> T >> 2013.02.19. 22:48, "Robert Newson" <[email protected]> ezt írta: >> >>> I agree entirely with your last statement (I filed >>> https://issues.apache.org/jira/browse/COUCHDB-1410 for exactly that >>> reason). >>> >>> However, I've been convinced that it cannot be done with >>> Javascript/JSON's meaning of number, hence the suggestion to protect >>> your values inside strings (which will not be altered or interpreted) >>> and use math functions that operate on them (the various bignum.js >>> libraries, for example). Another way to think of this is by >>> comparison; if you would be happy, in Java, to exclusively use >>> doubles, you'd be fine here. An important place where that is not >>> acceptable is money (and, related, currency). You can't invent >>> pennies. >>> >>> I'm +1 on including such a feature in a future release of CouchDB, but >>> I don't think I got consensus on the idea so far (since it can be done >>> today without such an extension). >>> >>> B. >>> >>> >>> On 19 February 2013 21:39, Luca Morandini <[email protected]> wrote: >>>> On 02/20/2013 08:23 AM, Robert Newson wrote: >>>>> >>>>> >>>>> The numbers are not being changed, you are simply being exposed to the >>>>> truth. :) >>>> >>>> >>>> Nicely and concisely put, though it must be noted that Node.js -for >>>> instance- keeps hiding the truth, hence there is a bit of inconsistency. >>>> >>>> But what if I rely on that low-fidelity representation ? >>>> This is a DBMS, people expects to get exactly what they put into it. >>>> >>>> >>>> Regards, >>>> >>>> Luca Morandini >>>> Data Architect - AURIN project >>>> Department of Computing and Information Systems >>>> University of Melbourne >>>> Tel. +61 03 903 58 380 >>>> Skype: lmorandini >>>> >>>
