On Dec 31, 2008, at 7:40 PM, Antony Blakey wrote:


On 31/12/2008, at 11:29 PM, Geir Magnusson Jr. wrote:

What trouble? I think this is *exactly* what should be done - have CouchDB store documents that are :

{
metadata : { _rev : X, _id : Y, _woogie: Z, .... anything that needs to be added in the future, like other metadata like last update date... },
  userdata : {  .... the document you want to store .... }
}

and then offer APIs that let you :

a) get to this document, for libraries and clients that know they are talking to Couch and want to manipulate at this level

b) return and accept the userdocument directly, for clients that just want to consume or produce JSON data, w/o caring about the internal housekeeping

One of the issues complicating the logic of this discussion is that the document id is both metadata and, conceptually, a document member.

Well, I don't understand why it has to be. Certainly it's a convenience, and I wonder how much of current thinking has been influenced by the fact that this what people are used to.

I can understand why CDB needs a unique document identifier, and it certainly would be nice to have the option of having it shoved into the user doc on creation. But

a) I think that I should have the choice as to what that identifier is (e.g. Configure the database to inject the couch metadata _id as "_couchID" or whatever...)

b) I should have the choice to not have it injected at all

So why do I think this is a problem? The 10gen appserver auto-injects an id field into the JSON documents that are stored in our database, Mongo. Can you guess what the key is? Yep - "_id"

So how can I roundtrip a doc from 10gen through couch and back? I can't.

I've made the same argument at 10gen - that I should be able to set the identifier (and that it shouldn't be in the doc in the first place).

Then, I'd just have a doc with

{
   _couchID : ....
   _mongoID : ....
    ... data...
}

(if I chose to shove the ID into the doc)


That's why, although the purest model is to have the userdata as a member within a Couch document as you suggest, this doesn't look that appealing:

{
 metadata: {
   id: ...
   rev: ...
   ...
 }
 data: {
   ... the user's document ...
 }
}

I can see how this isn't appealing from the perspective of current API's, but a rethinking of this issue (_id and _rev) also warrants a re-thinking of the APIs to deal with this.

E.g. an API that lets me get a) the whole doc above b) metadata only c) userdata only



Furthermore, from a scalability perspective, always having the metadata when you have the document, isn't a problem - the metadata is constrained.

And from what I understand, it already exists in that manner, right? I mean, for efficiency, I'd guess that the _id, _rev and in the future, other metadata (like insert date, last modificationdate...) would be kept outside of the doc, so that they can be read and updated w/o having to serialized/deserialize the whole user document.

The reverse situation of always having the data when you have the metadata, is not constrained because the data is arbitrarily large. IMO this means that a solution such as this:

{
 id: ...
 rev: ...
 ...
 data: {
   ... the user's document ...
 }
}

isn't such a good idea compared to this:

{
 _metadata: {
   id: ...
   rev: ...
 }
 ... the user's document ...
}

That only solves the problem in that there's only one reserved magical key (_metadata), but I don't think that really changes anything. You still need to make sure any document you want to store in couch doesn't have a top-level _metadata element.

And while I don't know how couch works internally, we *are* really only talking about how the data is returned on an API call via the REST API or what I assume is an internal API for the M/R View stuff.

If you had an API that let you choose all, metaonly or useronly, you could not be burdened with stuff you didn't want or need.


Unfortunately the reserved token makes the structure non-reflexive without transformation, and although that's not currently an issue, I can imagine it complicating certain use-cases. It makes the system more complicated to reason about.

I'm struggling to objectively evaluate this model and your reflexive model - given Damien's attitude to this issue, my motivation to do so is somewhat depressed :/

If you could point me to an explanation of why changing this is bad, I'd love to catch up on the discussion. I assume it's a technical reason?

geir



Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Did you hear about the Buddhist who refused Novocain during a root canal?
His goal: transcend dental medication.



Reply via email to