On Dec 28, 2008, at 9:00 AM, Paul Davis wrote:

On Sun, Dec 28, 2008 at 8:47 AM, Geir Magnusson Jr. <[email protected]> wrote:

On Dec 28, 2008, at 8:26 AM, Paul Davis wrote:

You're pretty much spot on here. "id" and "key" both refer to the
"_id" field in a document. And the "rev" does indeed refer to the
"_rev" attribute. Why "id" and "rev" are used instead of "_id" and
"_rev" I couldn't really tell you. I hate to say "historical reasons"
but I'm guessing that when Damien designed the view output he just
labeled then "id" and "rev" without the underscore because it's not
needed to distinguish from the rest of the doc.

Ok, cool. So... can key be something else? Or should I assume that "key"
is a synonym for "_id"?


Its a bit misleading because you chose _all_docs as the first view you
looked at. Really _all_docs is a special internal view that CouchDB
provides. When you get to defining your own views, you learn that
views are created by emit'ing key/value pairs that are arbitrary JSON
objects (no _id/_rev complaints even). So yes, key can be whatever you
want when defining a custom view.

I read the view docs (and have other questions there, like if the M/R is distributed across a cluster - I've used M/R w/ Hadoop, so I come w/ a set of assumptions...) and I saw that it doesn't *appear* that the key or id is injected in the view doc, which of course brings up an obvious question :)



[SNIP]

{
 _id : whatever
 _rev : whatever
 doc : { ..... the full user document that can have _id, _rev and
whatever....}
}



Like Noah says, reserving underscore prefixed fields as private to
CouchDB doesn't make it not JSON. I'd argue that putting the document stuff inside a doc member would probably be a annoyance in that every operation on the doc would require doc.doc.foo instead of just doc.foo

I certainly understand that there are tradeoffs. We do the same thing at 10gen - modify the user's document for storage. Some random thoughts :

1) doing an insert requires that the user document be deserialized (maybe only partially?), the additional fields inserted, and then re- serialized for storage. Have a metadata envelope means that the user document keyspace and
the server's metadata keyspace are totally decoupled.


I fail to see how these two points are related, but at the moment
partial de/serialization is not done in CouchDB. Its been discussed
(extensively) and has been more or less put on hold until there is a
JSON community supported diff format. Though, come to think of it,
that'll still require a full de/serialization round trip.

You're right - it's not related from the POV of making it convenient to access fields w/o the extra reference hop. I was just making a list of issues related to an envelope...

I'll go look at the dev archive to see if I can get a hint about what you are referring to.



2) It prevents, or at least makes harder, any document security - any hash function would have to account for the fact that there may be external keys injected into the document ("_*"). This is doable, but now makes your code - which was handling "generic JSON" - now have to know that it's working w/
a couchdb store....


I don't follow.

Suppose I wanted to ensure that my data isn't modified - I could produce a cryptographic signature of my JSON doc, add that to the doc, and then store it. But when it comes back, it now has two magical fields added - _id and _rev - which I'd have to remove before re- calculating my hash.

That's doable of course, but if I had some generalized library for doing this, there would have to be special handling when a doc is stored in couchdb vs other places (written to disk, tattooed on a hamster, whatever...)



3) the doc.doc.foo problem - Is that really a problem? I haven't worked w/ couch yet to understand the common access patterns, but it seems that the different calls to the rest API return things of different "shape" anyway... if you are accessing by document id, you could just get the user doc back, and it seems that other queries return metadata anyway (e.g. _all_docs) so people must be used to pulling the user doc out of the framing data.... You
could solve the issue in MR easily as well.


Its not a *problem* it'd just annoy me to have to type doc.doc.foo
instead of doc.foo.

Of course.  And I think that things that annoy me are problems :)



Anyway, I don't want this to distract :) It's just a subject I'm interested
in, as it's a personal pet peeve...

geir




HTH,
Paul Davis



Apologies if I seem confused. I haven't been to sleep since a long time ago.

All is well - thanks for the help.   I'll keep reading and playing.

geir



HTH,
Paul Davis

Reply via email to