Re: How to deal with updates of denormalized, duplicated data?

ermouth Tue, 01 Sep 2020 09:56:50 -0700

You might not need a background task. Since your doc _id:"issue.foo" knows
the _id of related doc exactly, you can emit dependency in your view
response, ie:


views:{foo:{
  map: function(doc){
    if (doc.type == 'issue') {
      emit (doc._id, {_id: doc.status._id}); // this emits dependency doc
    }
  }
}

Querying this view with include_docs=true will deliver body of a dependency
of type "enumeral" instead of parent doc of type "issue". Read more here
https://docs.couchdb.org/en/3.1.0/ddocs/views/joins.html#linked-documents.

This method shouldn’t be expected any fast, it reads actual docs when it
runs.

There also exists another method, more limited and tricky but sometimes
more fast. Numerals may emit in the same index with issues, but under
special range of keys. Also the view should have a reducer, which will
expand issues with enumerals. Since your reducer is no more a reducer, but
an expander, and bloats data serverside, you need to set reduce_limit=false
at CouchDB config.

When you query that kind of issues-related view, you can append
enumerals keyrange, and reducer joins all things together. Or do reduce at
client side.

Also there is another method employing _list function, which re-combines
view response in a manner of previously stated reducer. That sort of _list
is easier to implement (no rereduce complexities), and unlike reducer it
can pipe data. Note that lists may imply some speed penalties, and they are
deprecated.

Best regards,
ermouth


вт, 1 сент. 2020 г. в 17:51, Olaf Krüger <[email protected]>:

> Hi guys,
>
> I am still a CouchDB beginner but I read a lot about it in the meanwhile
> and also about common NOSQL stuff.
> But I still struggle with the question of how to handle duplicate data,
> especially with the update of duplicated data which are "distributed"
> across the entire database.
>
> Following scenario, just as an example:
>
> Imagine there is an issue document which has a relation to an enumeral (A
> particular item of an enumeration).
>
> // Issue doc
> {
>     _id: "issue.foo",
>     type: "issue",
>     title: "No joins with NoSQL",
>     // Duplicated "enumeral":
>     status: {
>          _id: "enumeral.status.in-progress",
>         type: "enumeral",
>         title: "In progres"
>     }
> }
>
> // Origin "enumeral" doc
> {
>      _id: "enumeral.status.in-progress",
>     type: "enumeral"
>     title: "In progres"
> }
>
> Now, we detect a typo at our enumeral:
> The title needs to be updated from "In progres" to "In progress"
> afterwards.
> And of course, all duplicates needs to be updated also (A few thousend
> docs could be affected).
>
> In order to achive this, I would create a background task at application
> level like this:
> - Search for all affected documents.
> - Replace/update the title of each particular doc
> - Bulk update all docs
>
> This might work, but now we created a new revision for each affected
> document.
> If a user works on the previous revision, he will ran into a conflict.
> Basically, this is pretty ok, but in this case, a background task has
> changed the document
> and in this special case, the user shouldn't take care of this little
> change.
>
> As an alternative, instead of duplicating the "enumeral" doc, I tried to
> "join" it by using "linked documents" or other kind of views.
> But then, I got at least two different docs instead of one clean formatted
> JSON doc.
> The result needs to be formatted at application level which doesn't feel
> right to me.
> Moreover, it also doesn't feel right to me to complicate the schema more
> than needed.
>
> So, I would like to ask you experienced guys how do you deal with such
> things.
> There might be not "the one any only" solution, but I would like to ensure
> if I am totally on the wrong path or even not. Maybe I have to change my
> thinking, not sure.
>
> Please also confirm if there's no way out of the dilemma, this would be
> really helpful too ;-)
>
> Many thanks in advance!
>
> Olaf
>
>
>
>

Re: How to deal with updates of denormalized, duplicated data?

Reply via email to