Hi,

just an idea how to manage large index that is updated very often.

Very often there is need to update an document in index. To update document in index you should delete old document from index and then add new one. In most cases it require you to open IndexReader, delete document, close IndexReader, create IndexWriter, add document, close IndexWriter, and re-open IndexSearcher (if index is searched heavily). Profiling some applications I found that most time is spend in IndexReader.open() method. Also it produces many objects, so it also gives GC overhead.

Idea to optimize this process is to create two indexes. One main index that could be very large and second index that will serve as "change buffer". We can keep one IndexReader open for the first index. (and use it for searching and for deleting old documents). Second index is small and we can reopen IndexReader frequently when needed.

when second index reaches some number of documents we can merge it with main index.
to search this "multi" index we could use MultiSearcher over this two indexes but with little trick: first IndexSearcher is kept same during all time till second index is merged with main and second IndexSearcher is reopened when second index changes.


It is just idea. (It is not tested)

Will it help to improve speed of updating large index and lower memory overhead?
Any comments?


Regards,
Volodymyr Bychkoviak



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to