The lucene config module in 2.4 clusters a  
org.apache.lucene.store.RAMDirectory

I don't have a huge amount of knowledge about the internals of lucene  
(not yet, anyway), so I don't really have any cool insights about  
what you suggest.  But, maybe someone on the dev list might?

--Orion

On Jun 22, 2007, at 11:26 AM, Kunal Bhasin wrote:

> Hey Orion,
>
> Lucene came up in Italy training and we identifed a use case which  
> I think makes a lot of sense and can be a pain point for many  
> Lucene Users.
>
> A question first:
>
> Do we cluster (in the work we have done so far in our Lucene config  
> module) just the RAMIndex or bot RAMIndex and DiskIndex?
>
> If we cluster both, what is the strategy of clustering DiskIndex?
>
> The pain point identified was that when the index size grows  
> exponentially (happens a lot it seems ;)), people like to keep  
> their indexes on disk. Now, the problem with distributing is that  
> the nanatural file-based locking does not guarantee that the index  
> won't get corrupted (as two threads could have updated the same  
> stream and last one wins). I think it would be great if Terracotta  
> couls provide distributed locking and thread coordination in this  
> case (acquire same lock on each index) with minimal contention to  
> guarantee that indexes don't get corrupted.
>
> I know that they could always rebuild the index from disk in  
> memory, but for very large data, that takes a lot of time.
>
> Also, Terracotta itself provides eviction to disk so RAMIndex+TC  
> should be good enough, but I understand (and I might be wrong) that  
> the way Lucene is designed, if someone is already using the  
> DiskIndex, it is a lot of rework (almost a complete redesign) to  
> move to RAMIndex.
>
> Any thoguhts?

_______________________________________________
tc-dev mailing list
[email protected]
http://lists.terracotta.org/mailman/listinfo/tc-dev

Reply via email to