Hi guys,

I am still a CouchDB beginner but I read a lot about it in the meanwhile and 
also about common NOSQL stuff.
But I still struggle with the question of how to handle duplicate data, 
especially with the update of duplicated data which are "distributed" across 
the entire database.

Following scenario, just as an example:

Imagine there is an issue document which has a relation to an enumeral (A 
particular item of an enumeration).

// Issue doc
{
    _id: "issue.foo",
    type: "issue", 
    title: "No joins with NoSQL",
    // Duplicated "enumeral":
    status: {
         _id: "enumeral.status.in-progress",
        type: "enumeral",
        title: "In progres"
    }
}

// Origin "enumeral" doc
{
     _id: "enumeral.status.in-progress",
    type: "enumeral"
    title: "In progres"
}

Now, we detect a typo at our enumeral:
The title needs to be updated from "In progres" to "In progress" afterwards.
And of course, all duplicates needs to be updated also (A few thousend docs 
could be affected).

In order to achive this, I would create a background task at application level 
like this:
- Search for all affected documents.
- Replace/update the title of each particular doc
- Bulk update all docs

This might work, but now we created a new revision for each affected document.
If a user works on the previous revision, he will ran into a conflict.
Basically, this is pretty ok, but in this case, a background task has changed 
the document
and in this special case, the user shouldn't take care of this little change.

As an alternative, instead of duplicating the "enumeral" doc, I tried to "join" 
it by using "linked documents" or other kind of views.
But then, I got at least two different docs instead of one clean formatted JSON 
doc.
The result needs to be formatted at application level which doesn't feel right 
to me.
Moreover, it also doesn't feel right to me to complicate the schema more than 
needed. 

So, I would like to ask you experienced guys how do you deal with such things.
There might be not "the one any only" solution, but I would like to ensure if I 
am totally on the wrong path or even not. Maybe I have to change my thinking, 
not sure.

Please also confirm if there's no way out of the dilemma, this would be really 
helpful too ;-)

Many thanks in advance!

Olaf



Reply via email to