Re: Update some fields for all documents: LUCENE-1879 vs. ParallelReader &.FilterIndex

Erick Erickson Wed, 03 Aug 2011 06:57:33 -0700

How are these fields used? Because if they're not used for searching, you could
put them in their own core and rebuild that index at your whim, then
querying that
core when you need the relationship information.


If you have a DB backing your system, you could perhaps store the info there
and query that (but I like the second core better <G>)..

But if you could use a separate index just for the relationships, you wouldn't
have to deal with the slow re-indexing of all the docs...

Best
Erick

On Mon, Aug 1, 2011 at 4:12 AM,  <karsten-s...@gmx.de> wrote:
> Hi lucene/solr-folk,
>
> Issue:
> Our documents are stable except for two fields which are used for linking 
> between the docs. So we like to update this two fields in a batch once a 
> month (possible once a week).
> We can not reindex all docs once a month, because we are using XeLDA in some 
> fields for stemming (morphological analysis), and XeLDA is slow. We have 14 
> Mio docs (less than 100GByte Main-Index and 3 GByte for this two changable 
> fields).
> In the next half year we will migrating our search engine from verity K2 to 
> solr; so we could wait for solr 4.0
> (
> btw any news about
> http://lucene.472066.n3.nabble.com/Release-schedule-Lucene-4-td2256958.html
> ?
> ).
>
> Solution?
>
> Our issue is exactly the purpose of ParallelReader.
> But Solr do not support ParallelReader (for a good reason:
> http://lucene.472066.n3.nabble.com/Vertical-Partitioning-advice-td494623.html#a494624
> ).
> So I see two possible ways to solve our issue:
> 1. waiting for the new Parallel incremental indexing
> (
> https://issues.apache.org/jira/browse/LUCENE-1879
> ) and hoping that solr will integrate this.
> Pro:
>  - nothing to do for us except waiting.
> Contra:
>  - I did not found anything of the (old) patch in current trunk.
>
> 2. Change lucene index below/without solr in a batch:
>   a) Each month generate a new index only with our two changed fields
>      (e.g. with DIH)
>   b) Use FilterIndex and ParallelReader to mock a correct index
>   c) “Merge” this mock index to a new Index
>      (via IndexWriter.addIndexes(IndexReader...) )
> Pro:
>  - The patch for https://issues.apache.org/jira/browse/LUCENE-1812
>   should be a good example, how to do this.
> Contra:
>  - relation between DocId and document index order is not an guaranteed 
> feature of DIH, (e.g. we will have to split the main index to ensure that no 
> merge will occur in/after DIH).
>  - To run this batch, solr has to be stopped and restarted.
>  - Even if we know, that our two field should change only for a subset of the 
> docs, we nevertheless have to reindex this two fields for all the docs.
>
> Any comments, hints or tips?
> Is there a third (better) way to solve our issue?
> Is there already an working example of the 2. solution?
> Will LUCENE-1879 (Parallel incremental indexing) be part of solr 4.0?
>
> Best regards
>  Karsten
>

Re: Update some fields for all documents: LUCENE-1879 vs. ParallelReader &.FilterIndex

Reply via email to