Re: content disappears in the index

2012-11-13 Thread Bernd Fehling
Hi Geoff, cool, that will eliminate possible regex pitfalls in schema.xml I was thinking about enhancing an existing filter as multi-purpose filter. E.g. TrimFilter, if maxLength is set then also limit the termAtt to maxLength. This will keep the number of available filters small, especially for s

Changing behavior of StandardAnalyzer

2012-11-13 Thread Bin Lan
So currently, if I use StandardAnalyzer to construct a QueryParser, and pass tString "From: some...@gmail.com" to the parser, it returns a query which is "From: someone From: gmail.com". Is there easy way that I can change this so it returns "From: "someone gmail.com"" instead? We had a in-house A

Re: Improving search performance for forum search

2012-11-13 Thread Arjen van der Meijden
Thanks Uwe, I was able to rewrite my code with just a few changes to use StraightByteRefDocValuesField for the field with 9 bytes and a PackedLongDocValuesField for the timestamps. The 9 bytes are actually a 1 byte type-identifier, 4 bytes for the topic id and another 4 bytes for the reply i

Re: content disappears in the index

2012-11-13 Thread Geoff Cooney
Hi, I've been following this thread and happen to have a simple TruncatingFilter class I wrote for the same purpose. I think this should do what you want: import java.io.IOException; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; import org.apach

Re: content disappears in the index

2012-11-13 Thread Erick Erickson
There's nothing in Solr that I know of that does this. It would be a pretty easy custom filter to create though FWIW, Erick On Tue, Nov 13, 2012 at 7:02 AM, Robert Muir wrote: > On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling > wrote: > > By the way, why does TrimFilter option updateOffse

Re: content disappears in the index

2012-11-13 Thread Robert Muir
On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling wrote: > By the way, why does TrimFilter option updateOffset defaults to false, > just keep it backwards compatible? > In my opinion this option should be removed. TokenFilters shouldn't muck with offsets, for a lot of reasons, but especially becau

Re: Combining The results from DB and Index Regd.,

2012-11-13 Thread kiwi clive
You could do it in page-size chunks. Get the db to do the searching and sorting and return the top page-size records. Do the same for the index. You then can build a ramindex that takes the db output and index output and creates 2*pagesize entries. Apply the same sorting mechanism and return

Re: Combining The results from DB and Index Regd.,

2012-11-13 Thread selvakumar netaji
Thanks Clive, Clive, Can we do this way of indexing if the RAM is limited. There would be two indexes, one in the file system and another in in-memory index as already mentioned. If the in-memory has reached a threshold then can we force the manual indexing of the databases which is supposed t

Re: Combining The results from DB and Index Regd.,

2012-11-13 Thread kiwi clive
I have used the last solution you mention many times to good effect as you can sort across the two data sources and merge the results. Obviously it depends on your architecture, RAM and and the amount of data you are dealing with. Clive From: selvakumar netaj

RE: Improving search performance for forum search

2012-11-13 Thread Uwe Schindler
IndexReader.document() is documented to be used only for presenting search results. Fetching the document for every possible hit while scoring is the performance killer (it is funny that your query only takes 300 ms, maybe the SSD). The correct solution is to use the new field type DocValues, wh