RE: SOLR Performance Tuning: Fuzzy Searches, Distance, BK-Tree

2010-01-22 Thread Fuad Efendi
http://issues.apache.org/jira/browse/LUCENE-2230
Enjoy!


 -Original Message-
 From: Fuad Efendi [mailto:f...@efendi.ca]
 Sent: January-19-10 11:32 PM
 To: solr-user@lucene.apache.org
 Subject: SOLR Performance Tuning: Fuzzy Searches, Distance, BK-Tree
 
 Hi,
 
 
 I am wondering: will SOLR or Lucene use caches for fuzzy searches? I
 mean
 per-term caching or something, internal to Lucene, or may be SOLR (SOLR
 may
 use own query parser)...
 
 Anyway, I implemented BK-Tree and playing with it right now, I altered
 FuzzyTermEnum class of Lucene...
 http://en.wikipedia.org/wiki/BK-tree
 
 - it seems performance of fuzzy searches boosted at least hundred times,
 but
 I need to do more tests... repeated similar (slightly different) queries
 run
 with better performance, probably because of OS-level file caching...
 but it
 could be that of BK-Tree distance! (although I need to use classic int
 instead of float distance by Lucene/Levenstein etc.)
 
 Thanks,
 Fuad Efendi
 +1 416-993-2060
 http://www.tokenizer.ca/
 Data Mining, Vertical Search
 
 
 





SOLR Performance Tuning: Fuzzy Searches, Distance, BK-Tree

2010-01-19 Thread Fuad Efendi
Hi,


I am wondering: will SOLR or Lucene use caches for fuzzy searches? I mean
per-term caching or something, internal to Lucene, or may be SOLR (SOLR may
use own query parser)...

Anyway, I implemented BK-Tree and playing with it right now, I altered
FuzzyTermEnum class of Lucene...
http://en.wikipedia.org/wiki/BK-tree

- it seems performance of fuzzy searches boosted at least hundred times, but
I need to do more tests... repeated similar (slightly different) queries run
with better performance, probably because of OS-level file caching... but it
could be that of BK-Tree distance! (although I need to use classic int
instead of float distance by Lucene/Levenstein etc.)

Thanks,
Fuad Efendi
+1 416-993-2060
http://www.tokenizer.ca/
Data Mining, Vertical Search