Storing immutable docs & accessing all docs as a reduction

Ronan Jouchet Tue, 05 Jan 2016 11:05:46 -0800

Hi couchdb-users!

After learning about Datomic [DAT] and reading advice like [CLD], I amexperimenting with considering my Couch documents immutable and wouldlove your feedback.

I'm not talking about Couch's underlying data structures, I'm talkingabout never updating/deleting documents. That is, instead of modeling a2-user document modification scenario as:


- PUT doc123 by alice with:
  {text: "hello"}

- PUT doc123 by bob with:
  {text: "hello world", rev: "rev_of_initial_rev"}

I would be modeling the same scenario as a series of "facts":

- PUT doc123-create by alice with:
  {text: "hello", created_at: "2016-01-01"}

- PUT doc123-modification-1fca291d by bob with:
  {text: "hello world", created_at: "2016-01-02"}

Similarly, operations like deletion, un-deletion, etc. can be coveredthrough conventions (stored as fields or part of the document _id).


Then, to access document "doc123",

- In the vanilla revision-based world, it's simply:
  GET .../mydb/doc123
      or a call to _all_docs with key/startkey/endkeys clauses as needed

- In the immutable world, a view does the reduction:
  GET .../mydb/_design/case/_view/immutable_docs_?group_level=1
      (with key/startkey/endkey clauses as needed)

  In my implementation, a map function emits composite keys:
    `emit([doc.doc_id, doc.created_at])`

  ... thus yielding (post-map, pre-reduce):
      key: ["doc123", "2016-01-01"], value: {text: "hello"}
      key: ["doc123", "2016-01-02"], value: {text: "hello world"}

  Then with "group_level=1" I am able to reduce these two "facts",
  using a reduce function that starts with an empty object and
  applies successive changes, ending up with the reduced object:
      key: ["doc123"], value: {text: "hello world"}

I have a working prototype that does just that and it seems to work.
Now, considering the following hypotheses:

- I understand it means more work to access data
  (these views are not going to build themselves)

- Volume-wise, I'm not expecting millions of documents, and let's
  assume at worst 100 "facts" per document.

- Unlike mentioned in [CLD], our motivation for trying immutability is
  not frequently-changing data: we anticipate slowly changing data
  (e.g. >10s between changes to the same document). We're more
  interested in traceability benefits for a regulated environment
  (no deletions, and an audit log / history becomes trivial:
   just don't reduce).

- The immutability model seems to have been tried in the Couch
  universe, as it sounds similar to Cloudant's advice [CLD] to
  "Consider Immutable Data" where, as they say, "data models based on
  immutable data require the use of views to summarize the documents
  which comprise the current state". Except, again, we'd be in for the
  traceability, not conflict avoidance.

My questions are:

1. Does it sound like a good idea, or is it perverting CouchDB's model
   and we should we stick with good'ol revisions?

2. *If* that doesn't sound abysmally perverted, I have a key ordering
   problem with my reduction function proof-of-concept:

   a. The results of my map emitting composite keys are correctly
      sorted at group_level=2:
      ["doc123", "2016-01-01"], ["doc123", "2016-01-02"].

   b. But as soon as I start reducing at group_level=1, the `values`
      passed to my reduce function seem to always be in the reverse
      order. That is, a `log(value.created_at);` in the body of my
      reduce function will print:
          2016-01-02
          2016-01-01
      I expected the contrary! And as mentioned by point a. , it used
      to be sorted at group_level=2 (ungrouped)! Note that it's not
      lexicographically sorted, it seems to always be reversed-sorted
      by the group_level+1 key index.

      Now, if I pass `&descending=true` to my view, the view becomes
      reversed (I'll see doc456 before doc123, which I do *not* want),
      but now the same logging in my reduce function correctly prints:
          2016-01-01
          2016-01-02

   -> Any idea what's wrong and how I can work around this? E.g. verify:
      - level=1 order should *not* be reversed
      - level=2 reduction should be done with reduce `values` ordered
        by level=2 composite key (and not reversed, without me
        manually  re-sorting `values`, which was already done by Couch!

Thanks for your help, and happy new year! :)

References ----

[DAT] http://www.datomic.com/

[CLD]https://cloudant.com/blog/my-top-5-tips-for-modelling-your-data-to-scale/


--
Ronan

Storing immutable docs & accessing all docs as a reduction

Reply via email to