Re: Forcing document reindex

Cliff Williams Wed, 17 Nov 2010 09:24:03 -0800

Nicolas,

I am not sure if I fully understand your use case (however it does soundintriguing and unusual).


A couple of things stick out in your commentary;

"The data is only weakly relational."
"DB updates are relatively few"

I assume that you are getting data out of your legacy MySQL system usingcomplex joins.??

Have you considered totally denormalising your data and input data tocouchdb based on the output of your MySQL reports ??Perhaps couchdb-lucene (or my current fav of the moment elasticsearchwhich is also based on lucene) would be useful ??

If none of the two suggestions are of any use. Could you post a moredetailed description (with a data sample if possible) of

"The hiccup is reporting. Some of it involves the full set of documents.Let'ssay I have 5 categories of documents involved in a report, A to E. Alinks to B,

B links to C, etc. The report needs data from A, B, and E. As far as I can

think, there's no way to do a view collation, because A and B share anID but E

doesn't. I can't pull a million documents from the DB to process elsewhere
either, so that nixes simple indexing and the '_id' object values."


Very best regards

Cliff

On 17/11/10 16:13, Nicolas Jessus wrote:

All right; no one should like what they're going to read.

I have a medium-sized MySQL system, which translates to a Couch with about a
million documents of about 20 types. The system would really benefit from a
schema-free design. The data is only weakly relational. Couch would fit really
well, enough that I don't mind twisting its arm in a few places if need be; the
tradeoff would be worth it.

The hiccup is reporting. Some of it involves the full set of documents. Let's
say I have 5 categories of documents involved in a report, A to E. A links to B,
B links to C, etc. The report needs data from A, B, and E. As far as I can
think, there's no way to do a view collation, because A and B share an ID but E
doesn't. I can't pull a million documents from the DB to process elsewhere
either, so that nixes simple indexing and the '_id' object values.

I could however write a special view_server that will emit keys after checking
the linked ID through an HTTP call (that's where you scream). Indexing
performance is totally unimportant to me, DB updates are relatively few, and I
can live with the dirty side-effects (again, the system as a whole would still
be much cleaner than the MySQL one).

With that solution I can have a map function that just handle docs of type A.
But I still need to reindex the relevant As when B or E changes. I could simply
listen to the change stream and force a reindex, but that doesn't work well with
legitimate updates when the _rev number goes up at random even though the doc
hasn't changed, and there's no auto-merge. So I'm pretty stuck.

I'm not asking that this type of functionality be encouraged. It's clearly
subverting the point of Couch. On the other hand, it doesn't seem like having a
force-reindex function would dirty the concept, and if it's easy to code, then
it's a shame it doesn't exist.

Re: Forcing document reindex

Reply via email to