LetterTokenizerFactory will use each contiguous sequence of letters and discard the rest. http, https, com, etc. would need to be a stopword.
Alternatively you can try PatternTokenizerFactory with a regular expression if you are looking for a specific part of the URL. On Sep 23, 2010, at 10:59 PM, Max Lynch wrote: > Is there a tokenizer that will allow me to search for parts of a URL? For > example, the search "google" would match on the data " > http://mail.google.com/dlkjadf" > > This tokenizer factory doesn't seem to be sufficient: > > <fieldType name="text_standard" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" > language="English" protected="protwords.txt"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" > language="English" protected="protwords.txt"/> > </analyzer> > </fieldType> > > Thanks.