Re: Search a URL
WordDelimiterFilter On Friday 24 September 2010 02:42:52 Dennis Gearon wrote: > WDF is not WTF(what I think when I see WDF), right ;-) > > What is WDF? > > Dennis Gearon > > Signature Warning > > EARTH has a Right To Life, > otherwise we all die. > > Read 'Hot, Flat, and Crowded' > Laugh at http://www.yert.com/film.php > > --- On Thu, 9/23/10, Markus Jelsma wrote: > > From: Markus Jelsma > > Subject: RE: Search a URL > > To: solr-user@lucene.apache.org > > Date: Thursday, September 23, 2010, 2:11 PM > > Try setting generateWordParts=1 in > > your WDF. Also, having a WhitespaceTokenizer makes little > > sense for URL's, there should be no whitespace in a URL, the > > StandardTokenizer can tokenize a URL. Anyway, the problem is > > your WDF. > > > > -Original message- > > From: Max Lynch > > Sent: Thu 23-09-2010 23:00 > > To: solr-user@lucene.apache.org; > > > > Subject: Search a URL > > > > Is there a tokenizer that will allow me to search for parts > > of a URL? For > > example, the search "google" would match on the data " > > http://mail.google.com/dlkjadf"; > > > > This tokenizer factory doesn't seem to be sufficient: > > > > > class="solr.TextField" > > positionIncrementGap="100"> > > > > > class="solr.WhitespaceTokenizerFactory"/> > > > class="solr.WordDelimiterFilterFactory" > > generateWordParts="0" generateNumberParts="1" > > catenateWords="1" > > catenateNumbers="1" catenateAll="0" > > splitOnCaseChange="1"/> > > > class="solr.LowerCaseFilterFactory"/> > > > class="solr.SnowballPorterFilterFactory" > > language="English" protected="protwords.txt"/> > > > > > > > class="solr.WhitespaceTokenizerFactory"/> > > > > > class="solr.WordDelimiterFilterFactory" > > generateWordParts="0" generateNumberParts="1" > > catenateWords="1" > > catenateNumbers="1" catenateAll="0" > > splitOnCaseChange="1"/> > > > class="solr.LowerCaseFilterFactory"/> > > > class="solr.SnowballPorterFilterFactory" > > language="English" protected="protwords.txt"/> > > > > > > > > Thanks. > Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Search a URL
WDF is not WTF(what I think when I see WDF), right ;-) What is WDF? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/23/10, Markus Jelsma wrote: > From: Markus Jelsma > Subject: RE: Search a URL > To: solr-user@lucene.apache.org > Date: Thursday, September 23, 2010, 2:11 PM > Try setting generateWordParts=1 in > your WDF. Also, having a WhitespaceTokenizer makes little > sense for URL's, there should be no whitespace in a URL, the > StandardTokenizer can tokenize a URL. Anyway, the problem is > your WDF. > > -Original message- > From: Max Lynch > Sent: Thu 23-09-2010 23:00 > To: solr-user@lucene.apache.org; > > Subject: Search a URL > > Is there a tokenizer that will allow me to search for parts > of a URL? For > example, the search "google" would match on the data " > http://mail.google.com/dlkjadf"; > > This tokenizer factory doesn't seem to be sufficient: > > class="solr.TextField" > positionIncrementGap="100"> > > class="solr.WhitespaceTokenizerFactory"/> > class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="1" > catenateWords="1" > catenateNumbers="1" catenateAll="0" > splitOnCaseChange="1"/> > class="solr.LowerCaseFilterFactory"/> > class="solr.SnowballPorterFilterFactory" > language="English" protected="protwords.txt"/> > > > class="solr.WhitespaceTokenizerFactory"/> > > class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="1" > catenateWords="1" > catenateNumbers="1" catenateAll="0" > splitOnCaseChange="1"/> > class="solr.LowerCaseFilterFactory"/> > class="solr.SnowballPorterFilterFactory" > language="English" protected="protwords.txt"/> > > > > Thanks. >
RE: Search a URL
Try setting generateWordParts=1 in your WDF. Also, having a WhitespaceTokenizer makes little sense for URL's, there should be no whitespace in a URL, the StandardTokenizer can tokenize a URL. Anyway, the problem is your WDF. -Original message- From: Max Lynch Sent: Thu 23-09-2010 23:00 To: solr-user@lucene.apache.org; Subject: Search a URL Is there a tokenizer that will allow me to search for parts of a URL? For example, the search "google" would match on the data " http://mail.google.com/dlkjadf"; This tokenizer factory doesn't seem to be sufficient: Thanks.
Re: Search a URL
LetterTokenizerFactory will use each contiguous sequence of letters and discard the rest. http, https, com, etc. would need to be a stopword. Alternatively you can try PatternTokenizerFactory with a regular expression if you are looking for a specific part of the URL. On Sep 23, 2010, at 10:59 PM, Max Lynch wrote: > Is there a tokenizer that will allow me to search for parts of a URL? For > example, the search "google" would match on the data " > http://mail.google.com/dlkjadf"; > > This tokenizer factory doesn't seem to be sufficient: > > positionIncrementGap="100"> > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > language="English" protected="protwords.txt"/> > > > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > language="English" protected="protwords.txt"/> > > > > Thanks.
Search a URL
Is there a tokenizer that will allow me to search for parts of a URL? For example, the search "google" would match on the data " http://mail.google.com/dlkjadf"; This tokenizer factory doesn't seem to be sufficient: Thanks.