FuzzyQuery minimumSimilarity

2012-11-05 Thread Damian Birchler
Hi there Lucene calucaltes the string similarity between two strings s1 and s2 according to the formula Similarity = Levenshtein-Distance(s1,s2)/min(Length(s1),Length(s2)) I would have thought Lucene would divide by the length of the longer string. In particular, the above formula could - in m

Overriding DefaultSimilarity to not consider tf/idf and friends

2012-11-05 Thread Damian Birchler
Hi everyone We are using Lucene to search for possible duplicates in an address database. We create an index with a document for each person in the database. Each document has a field with one term for the first name, a field with one term for the last name and so on. I think in this setting it

search-time Field.setBoost()

2012-08-27 Thread Damian Birchler
Hello list I'm looking for something like Field.setBoost(float boost) that can be set at search time. The reason for this is that we would like to provide user (client-side) configurable search queries, where the user can assign weights to the fields (all fields, not just those mentioned in the

Use an analyzer with Term, FuzzyQuery, BooleanQuery and friends

2012-08-27 Thread Damian Birchler
Hello list I build my queries programmatically with Term, NumericTerm, FuzzyQuery, BooleanQuery etc. In particular, I do not use QueryParser to build my query from a string. Still, I would like to first run the values for my terms through an analzyer (more precisely, the same analyzer that I us