RE: capitalization and delimiters
Hi Shalin I mixed up and sent the wrong schema, one that I had been testing with. I was using the same configuration as the example schema with the same results. I re-tested by re-indexing just to confirm. Also, yes I do have lowercase factory after the word delimiter. powerShot does not return the results for 'powershot' only for power and shot. If I switch lowercase factory before word delimiter, then I do get the results for powershot, but may not get the results if just searching 'power' or 'shot'. ThanksAudrey Date: Wed, 14 Oct 2009 23:28:46 +0530 Subject: Re: capitalization and delimiters From: shalinman...@gmail.com To: solr-user@lucene.apache.org CC: au...@hotmail.com On Mon, Oct 12, 2009 at 9:09 PM, Audrey Foo au...@hotmail.com wrote: In my search docs, I have content such as 'powershot' and 'powerShot'. I would expect 'powerShot' would be searched as 'power', 'shot' and 'powershot', so that results for all these are returned. Instead, only results for 'power' and 'shot' are returned. Any suggestions? In the schema, index analyzer:filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/filter class=solr.LowerCaseFilterFactory/ In the schema, query analyzerfilter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/filter class=solr.LowerCaseFilterFactory/ I find your index-time and query-time configuration very strange. Assuming that you also have a lowercase filter, it seems that a token powerShot will not be split and indexed as powershot. Then during query, both power and shot will match nothing. I suggest you start with the configuration given in the example schema. Else, it'd be easier for us if you can help us understand the reasons behind changing these parameters. -- Regards, Shalin Shekhar Mangar. _ New: Messenger sign-in on the MSN homepage http://go.microsoft.com/?linkid=9677403
Re: capitalization and delimiters
On Fri, Oct 16, 2009 at 9:56 PM, Audrey Foo au...@hotmail.com wrote: Hi Shalin I mixed up and sent the wrong schema, one that I had been testing with. I was using the same configuration as the example schema with the same results. I re-tested by re-indexing just to confirm. Also, yes I do have lowercase factory after the word delimiter. powerShot does not return the results for 'powershot' only for power and shot. If I switch lowercase factory before word delimiter, then I do get the results for powershot, but may not get the results if just searching 'power' or 'shot'. OK, thanks for the clarification. You need to add preserveOriginal=1 to your index-time WDF configuration. This will index the original token as well as the parts so that all of powershot, power and shot should match powerShot. Make sure you re-index after making the changes. -- Regards, Shalin Shekhar Mangar.
Re: capitalization and delimiters
On Mon, Oct 12, 2009 at 9:09 PM, Audrey Foo au...@hotmail.com wrote: In my search docs, I have content such as 'powershot' and 'powerShot'. I would expect 'powerShot' would be searched as 'power', 'shot' and 'powershot', so that results for all these are returned. Instead, only results for 'power' and 'shot' are returned. Any suggestions? In the schema, index analyzer:filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/filter class=solr.LowerCaseFilterFactory/ In the schema, query analyzerfilter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/filter class=solr.LowerCaseFilterFactory/ I find your index-time and query-time configuration very strange. Assuming that you also have a lowercase filter, it seems that a token powerShot will not be split and indexed as powershot. Then during query, both power and shot will match nothing. I suggest you start with the configuration given in the example schema. Else, it'd be easier for us if you can help us understand the reasons behind changing these parameters. -- Regards, Shalin Shekhar Mangar.
capitalization and delimiters
In my search docs, I have content such as 'powershot' and 'powerShot'. I would expect 'powerShot' would be searched as 'power', 'shot' and 'powershot', so that results for all these are returned. Instead, only results for 'power' and 'shot' are returned. Any suggestions? In the schema, index analyzer:filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/filter class=solr.LowerCaseFilterFactory/ In the schema, query analyzerfilter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/filter class=solr.LowerCaseFilterFactory/ ThanksAudrey _ New! Open Messenger faster on the MSN homepage http://go.microsoft.com/?linkid=9677405