On Dec 28, 2008, at 9:00 AM, Paul Davis wrote:
On Sun, Dec 28, 2008 at 8:47 AM, Geir Magnusson Jr. <[email protected]>
wrote:
On Dec 28, 2008, at 8:26 AM, Paul Davis wrote:
You're pretty much spot on here. "id" and "key" both refer to the
"_id" field in a document. And the "rev" does indeed refer to the
"_rev" attribute. Why "id" and "rev" are used instead of "_id" and
"_rev" I couldn't really tell you. I hate to say "historical
reasons"
but I'm guessing that when Damien designed the view output he just
labeled then "id" and "rev" without the underscore because it's not
needed to distinguish from the rest of the doc.
Ok, cool. So... can key be something else? Or should I assume
that "key"
is a synonym for "_id"?
Its a bit misleading because you chose _all_docs as the first view you
looked at. Really _all_docs is a special internal view that CouchDB
provides. When you get to defining your own views, you learn that
views are created by emit'ing key/value pairs that are arbitrary JSON
objects (no _id/_rev complaints even). So yes, key can be whatever you
want when defining a custom view.
I read the view docs (and have other questions there, like if the M/R
is distributed across a cluster - I've used M/R w/ Hadoop, so I come
w/ a set of assumptions...) and I saw that it doesn't *appear* that
the key or id is injected in the view doc, which of course brings up
an obvious question :)
[SNIP]
{
_id : whatever
_rev : whatever
doc : { ..... the full user document that can have _id, _rev and
whatever....}
}
Like Noah says, reserving underscore prefixed fields as private to
CouchDB doesn't make it not JSON. I'd argue that putting the
document
stuff inside a doc member would probably be a annoyance in that
every
operation on the doc would require doc.doc.foo instead of just
doc.foo
I certainly understand that there are tradeoffs. We do the same
thing at
10gen - modify the user's document for storage. Some random
thoughts :
1) doing an insert requires that the user document be deserialized
(maybe
only partially?), the additional fields inserted, and then re-
serialized for
storage. Have a metadata envelope means that the user document
keyspace and
the server's metadata keyspace are totally decoupled.
I fail to see how these two points are related, but at the moment
partial de/serialization is not done in CouchDB. Its been discussed
(extensively) and has been more or less put on hold until there is a
JSON community supported diff format. Though, come to think of it,
that'll still require a full de/serialization round trip.
You're right - it's not related from the POV of making it convenient
to access fields w/o the extra reference hop. I was just making a
list of issues related to an envelope...
I'll go look at the dev archive to see if I can get a hint about what
you are referring to.
2) It prevents, or at least makes harder, any document security -
any hash
function would have to account for the fact that there may be
external keys
injected into the document ("_*"). This is doable, but now makes
your code
- which was handling "generic JSON" - now have to know that it's
working w/
a couchdb store....
I don't follow.
Suppose I wanted to ensure that my data isn't modified - I could
produce a cryptographic signature of my JSON doc, add that to the doc,
and then store it. But when it comes back, it now has two magical
fields added - _id and _rev - which I'd have to remove before re-
calculating my hash.
That's doable of course, but if I had some generalized library for
doing this, there would have to be special handling when a doc is
stored in couchdb vs other places (written to disk, tattooed on a
hamster, whatever...)
3) the doc.doc.foo problem - Is that really a problem? I haven't
worked w/
couch yet to understand the common access patterns, but it seems
that the
different calls to the rest API return things of different "shape"
anyway...
if you are accessing by document id, you could just get the user
doc back,
and it seems that other queries return metadata anyway (e.g.
_all_docs) so
people must be used to pulling the user doc out of the framing
data.... You
could solve the issue in MR easily as well.
Its not a *problem* it'd just annoy me to have to type doc.doc.foo
instead of doc.foo.
Of course. And I think that things that annoy me are problems :)
Anyway, I don't want this to distract :) It's just a subject I'm
interested
in, as it's a personal pet peeve...
geir
HTH,
Paul Davis
Apologies if I seem confused. I haven't been to sleep since a long
time ago.
All is well - thanks for the help. I'll keep reading and playing.
geir
HTH,
Paul Davis