On 19. Februar 2014 at 13:55:43, Suraj Kumar ([email protected]) wrote:
> Hi,
>
> If we put documents with same field name twice, we see both keys together
> in the document.
>
> suraj@laptop:~ $ curl -d '{"_id":"doc","key":"value","key":"value"}' -X PUT
> http://mydbhost:5984/test/doc
> {"ok":true,"id":"doc","rev":"1-49200ce1b14d686a961d10af01026cf8"}
> suraj@laptop:~ $ curl http://mydbhost:5984/test/doc
> {"_id":"doc","_rev":"1-49200ce1b14d686a961d10af01026cf8","key":"value","key":"value"}
>
>
> This seems wrong. While JS engines (node as well as mozjs) seem to be
> correctly 'overwriting' the key, why is couch storing everything? Is this a
> bug?
>
> Or am I wrong? (I'm using version 1.4.0)
>
> Regards,
>
> -Suraj
TL;DR the appropriately named ECMA 404 JSON spec [1] is broken or more
politely, insufficiently specific.
This and other edge cases are not even mentioned. The RFC is marginally better
[2], see below, but even Crockford isn’t sure what should happen [3]. The more
recent ECMAScript 5.1 says “NOTE In the case where there are duplicate name
Strings within an object, lexically preceding values for the same key shall be
overwritten”.
“The nice thing about standards is that there are so many of them to choose
from.”
— Andrew S. Tanenbaum
JSON is typically based on a dictionary or hash map, and there’s no particular
reason for that data structure to enforce uniqueness of keys. For example
erlang has both unique and repeated key data structures available. JavaScript
presumably only has the unique flavour.
From the IETF RFC:
“The names within an object SHOULD be unique.”
“A JSON parser MUST accept all texts that conform to the JSON grammar."
Now you *could* have a JSON parser that decides arbitrarily to delete some of
your data, before passing it to the storage engine to save on disk. Personally
I’d rather CouchDB keeps the duplicates, but until we see a
content-type:application/json2 that specifies how to handle these important
edge cases, I guess the status quo is not unreasonable? The alternative is to
return error invalid_json which is incorrect.
The waters are muddied further because the conversion to/from JSON docs &
couchdb on-disk format is handled in Erlang/OTP, with a parser that makes this
distinction, but the view engine will be in JavaScript and presumably will do
some cleaning up depending on which spec that JS engine supports… YMMV.
--
Dave Cottlehuber
Sent from my PDP11
[1]: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
[2]: http://tools.ietf.org/html/rfc4627#section-2.2
[3]: http://esdiscuss.org/topic/json-duplicate-keys
[4]: http://es5.github.io/x15.12.html