On 19. Februar 2014 at 13:55:43, Suraj Kumar ([email protected]) wrote:
> Hi,
>  
> If we put documents with same field name twice, we see both keys together
> in the document.
>  
> suraj@laptop:~ $ curl -d '{"_id":"doc","key":"value","key":"value"}' -X PUT
> http://mydbhost:5984/test/doc
> {"ok":true,"id":"doc","rev":"1-49200ce1b14d686a961d10af01026cf8"}
> suraj@laptop:~ $ curl http://mydbhost:5984/test/doc
> {"_id":"doc","_rev":"1-49200ce1b14d686a961d10af01026cf8","key":"value","key":"value"}
>   
>  
> This seems wrong. While JS engines (node as well as mozjs) seem to be
> correctly 'overwriting' the key, why is couch storing everything? Is this a
> bug?
>  
> Or am I wrong? (I'm using version 1.4.0)
>  
> Regards,
>  
> -Suraj

TL;DR the appropriately named ECMA 404 JSON spec [1] is broken or more 
politely, insufficiently specific.

This and other edge cases are not even mentioned. The RFC is marginally better 
[2], see below,  but even Crockford isn’t sure what should happen [3]. The more 
recent ECMAScript 5.1 says “NOTE In the case where there are duplicate name 
Strings within an object, lexically preceding values for the same key shall be 
overwritten”.

    “The nice thing about standards is that there are so many of them to choose 
from.”
        — Andrew S. Tanenbaum

JSON is typically based on a dictionary or hash map, and there’s no particular 
reason for that data structure to enforce uniqueness of keys. For example 
erlang has both unique and repeated key data structures available. JavaScript 
presumably only has the unique flavour. 

From the IETF RFC:

        “The names within an object SHOULD be unique.”
        “A JSON parser MUST accept all texts that conform to the JSON grammar."

Now you *could* have a JSON parser that decides arbitrarily to delete some of 
your data, before passing it to the storage engine to save on disk. Personally 
I’d rather CouchDB keeps the duplicates, but until we see a 
content-type:application/json2 that specifies how to handle these important 
edge cases, I guess the status quo is not unreasonable? The alternative is to 
return error invalid_json which is incorrect.

The waters are muddied further because the conversion to/from JSON docs & 
couchdb on-disk format is handled in Erlang/OTP, with a parser that makes this 
distinction, but the view engine will be in JavaScript and presumably will do 
some cleaning up depending on which spec that JS engine supports… YMMV.

--  
Dave Cottlehuber
Sent from my PDP11

[1]: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
[2]: http://tools.ietf.org/html/rfc4627#section-2.2
[3]: http://esdiscuss.org/topic/json-duplicate-keys
[4]: http://es5.github.io/x15.12.html


Reply via email to