On Dec 31, 2008, at 7:40 PM, Antony Blakey wrote:
On 31/12/2008, at 11:29 PM, Geir Magnusson Jr. wrote:
What trouble? I think this is *exactly* what should be done - have
CouchDB store documents that are :
{
metadata : { _rev : X, _id : Y, _woogie: Z, .... anything that
needs to be added in the future, like other metadata like last
update date... },
userdata : { .... the document you want to store .... }
}
and then offer APIs that let you :
a) get to this document, for libraries and clients that know they
are talking to Couch and want to manipulate at this level
b) return and accept the userdocument directly, for clients that
just want to consume or produce JSON data, w/o caring about the
internal housekeeping
One of the issues complicating the logic of this discussion is that
the document id is both metadata and, conceptually, a document member.
Well, I don't understand why it has to be. Certainly it's a
convenience, and I wonder how much of current thinking has been
influenced by the fact that this what people are used to.
I can understand why CDB needs a unique document identifier, and it
certainly would be nice to have the option of having it shoved into
the user doc on creation. But
a) I think that I should have the choice as to what that identifier
is (e.g. Configure the database to inject the couch metadata _id as
"_couchID" or whatever...)
b) I should have the choice to not have it injected at all
So why do I think this is a problem? The 10gen appserver auto-injects
an id field into the JSON documents that are stored in our database,
Mongo. Can you guess what the key is? Yep - "_id"
So how can I roundtrip a doc from 10gen through couch and back? I
can't.
I've made the same argument at 10gen - that I should be able to set
the identifier (and that it shouldn't be in the doc in the first place).
Then, I'd just have a doc with
{
_couchID : ....
_mongoID : ....
... data...
}
(if I chose to shove the ID into the doc)
That's why, although the purest model is to have the userdata as a
member within a Couch document as you suggest, this doesn't look
that appealing:
{
metadata: {
id: ...
rev: ...
...
}
data: {
... the user's document ...
}
}
I can see how this isn't appealing from the perspective of current
API's, but a rethinking of this issue (_id and _rev) also warrants a
re-thinking of the APIs to deal with this.
E.g. an API that lets me get a) the whole doc above b) metadata only
c) userdata only
Furthermore, from a scalability perspective, always having the
metadata when you have the document, isn't a problem - the metadata
is constrained.
And from what I understand, it already exists in that manner, right?
I mean, for efficiency, I'd guess that the _id, _rev and in the
future, other metadata (like insert date, last modificationdate...)
would be kept outside of the doc, so that they can be read and updated
w/o having to serialized/deserialize the whole user document.
The reverse situation of always having the data when you have the
metadata, is not constrained because the data is arbitrarily large.
IMO this means that a solution such as this:
{
id: ...
rev: ...
...
data: {
... the user's document ...
}
}
isn't such a good idea compared to this:
{
_metadata: {
id: ...
rev: ...
}
... the user's document ...
}
That only solves the problem in that there's only one reserved magical
key (_metadata), but I don't think that really changes anything. You
still need to make sure any document you want to store in couch
doesn't have a top-level _metadata element.
And while I don't know how couch works internally, we *are* really
only talking about how the data is returned on an API call via the
REST API or what I assume is an internal API for the M/R View stuff.
If you had an API that let you choose all, metaonly or useronly, you
could not be burdened with stuff you didn't want or need.
Unfortunately the reserved token makes the structure non-reflexive
without transformation, and although that's not currently an issue,
I can imagine it complicating certain use-cases. It makes the system
more complicated to reason about.
I'm struggling to objectively evaluate this model and your reflexive
model - given Damien's attitude to this issue, my motivation to do
so is somewhat depressed :/
If you could point me to an explanation of why changing this is bad,
I'd love to catch up on the discussion. I assume it's a technical
reason?
geir
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Did you hear about the Buddhist who refused Novocain during a root
canal?
His goal: transcend dental medication.