Hi Alejandro, N-grams <http://en.wikipedia.org/wiki/N-gram> might be a good fit.
Using bigrams (n-grams of length 2) for "london", you'd get tokens "lo", "on", "nd", "do", "on". This should provide the hit ordering you want. Although it's not listed on Solr's analysis factories wiki page <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>, there is an NGramFilterFactory, with attributes maxGramSize and minGramSize. See the example usage on the javadocs here: <http://lucene.apache.org/solr/api/org/apache/solr/analysis/NGramFilterFactory.html>. Also a tokenizer variant: <http://lucene.apache.org/solr/api/org/apache/solr/analysis/NGramTokenizerFactory.html>. Steve -----Original Message----- From: Alejandro Cuesta [mailto:alejandro.cue...@gmail.com] Sent: Wednesday, May 16, 2012 12:51 PM To: solr-user@lucene.apache.org Subject: Sort by length percentage match Hi, I have a field containing "cities" and I'd like to sort the results based on length percentage match. Example: Asuming I've got these cities in the index: london, south west london, londonderry, oxford And I search for "london", I'd like to get a list sorted like this: london (6/6, 100% match) londonderry (6/11, 54% match) south west london (6/17, 35% match) I know Lucene uses a different scoring algorithm base on term frecuency and inverse document frecuency (tf & idf) but in my specific example I need to use this scoring strategy. Can anyone give a clue or start point please? Is there a better technology to perform this kind of search? Thanks, Alejandro