Re: how to reasonably estimate the disk size for Lucene 4.x

2015-03-24 Thread Gaurav gupta
Erick, When further testing the index sizes using Lucene APIs (I am directing using Lucene not through Solr), I found that the index sizes are quite huge compare to the formula (I have attached the excel sheet). But one thing which I observe that the index sizes increases linearly w.r.t. no. of

Re: how to reasonably estimate the disk size for Lucene 4.x

2015-03-24 Thread Jack Krupansky
Indexing a fraction of the data, such as 10% or 5%, is probably the best way to do size estimation. The only real caveat is that you also need to look at RAM as well. Most modern hardware has huge mass storage capacity relative to the CPU requirements for Lucene to process that data, while IT

Java versions transition without re-indexing.

2015-03-24 Thread Bogdan Snisar
Hi, folks! This is not a trivial question, but I appeal to your experience with Lucene... Lucene Implementation Version: 2.9.1 Solr Implementation Version: 1.4 Java version: 1.6 This is legacy environment with a huge amount of indexed data. The main question that I encountered few days ago

Re: How to merge several Taxonomy indexes

2015-03-24 Thread Gimantha Bandara
Hi Christoph, My mistake. :) It does the exactly what i need. figured it out later.. Thanks a lot! On Tue, Mar 24, 2015 at 3:14 AM, Gimantha Bandara giman...@wso2.com wrote: Hi Christoph, I think TaxonomyMergeUtils is to merge a taxonomy directory and an index together (Correct me if I am

Re: CachingTokenFilter tests fail when using MockTokenizer

2015-03-24 Thread Spyros Kapnissis
Hello Uwe, thanks a lot for your answer. Makes perfect sense - I knew something was wrong with CachingTokenFilter! I will try to modify and adapt the filter to avoid the error as per your instructions. By the way, is there a better way/pattern to use for consuming two (or more) times the