RE: capitalization and delimiters

2009-10-16 Thread Audrey Foo

Hi Shalin
I mixed up and sent the wrong schema, one that I had been testing with. 
I was using the same configuration as the example schema with the same results. 
I re-tested by re-indexing just to confirm. Also, yes I do have lowercase 
factory after the word delimiter.
powerShot does not return the results for 'powershot' only for power and shot.
If I switch lowercase factory before word delimiter, then I do get the results 
for powershot, but may not get the results if just searching 'power' or 'shot'.
ThanksAudrey

 Date: Wed, 14 Oct 2009 23:28:46 +0530
 Subject: Re: capitalization and delimiters
 From: shalinman...@gmail.com
 To: solr-user@lucene.apache.org
 CC: au...@hotmail.com
 
 On Mon, Oct 12, 2009 at 9:09 PM, Audrey Foo au...@hotmail.com wrote:
 
 
  In my search docs, I have content such as 'powershot' and 'powerShot'.
  I would expect 'powerShot' would be searched as 'power', 'shot' and
  'powershot', so that results for all these are returned. Instead, only
  results for 'power' and 'shot' are returned.
  Any suggestions?
  In the schema, index analyzer:filter
  class=solr.WordDelimiterFilterFactory generateWordParts=0
  generateNumberParts=0 catenateWords=1 catenateNumbers=1
  catenateAll=0/filter class=solr.LowerCaseFilterFactory/
  In the schema, query analyzerfilter
  class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0 catenateNumbers=0
  catenateAll=0 splitOnCaseChange=1/filter
  class=solr.LowerCaseFilterFactory/
 
 
 I find your index-time and query-time configuration very strange. Assuming
 that you also have a lowercase filter, it seems that a token powerShot
 will not be split and indexed as powershot. Then during query, both
 power and shot will match nothing.
 
 I suggest you start with the configuration given in the example schema.
 Else, it'd be easier for us if you can help us understand the reasons behind
 changing these parameters.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
  
_
New: Messenger sign-in on the MSN homepage
http://go.microsoft.com/?linkid=9677403

Re: capitalization and delimiters

2009-10-16 Thread Shalin Shekhar Mangar
On Fri, Oct 16, 2009 at 9:56 PM, Audrey Foo au...@hotmail.com wrote:


 Hi Shalin
 I mixed up and sent the wrong schema, one that I had been testing with.
 I was using the same configuration as the example schema with the same
 results. I re-tested by re-indexing just to confirm. Also, yes I do have
 lowercase factory after the word delimiter.
 powerShot does not return the results for 'powershot' only for power and
 shot.
 If I switch lowercase factory before word delimiter, then I do get the
 results for powershot, but may not get the results if just searching 'power'
 or 'shot'.


OK, thanks for the clarification. You need to add preserveOriginal=1 to
your index-time WDF configuration. This will index the original token as
well as the parts so that all of powershot, power and shot should
match powerShot. Make sure you re-index after making the changes.

-- 
Regards,
Shalin Shekhar Mangar.


Re: capitalization and delimiters

2009-10-14 Thread Shalin Shekhar Mangar
On Mon, Oct 12, 2009 at 9:09 PM, Audrey Foo au...@hotmail.com wrote:


 In my search docs, I have content such as 'powershot' and 'powerShot'.
 I would expect 'powerShot' would be searched as 'power', 'shot' and
 'powershot', so that results for all these are returned. Instead, only
 results for 'power' and 'shot' are returned.
 Any suggestions?
 In the schema, index analyzer:filter
 class=solr.WordDelimiterFilterFactory generateWordParts=0
 generateNumberParts=0 catenateWords=1 catenateNumbers=1
 catenateAll=0/filter class=solr.LowerCaseFilterFactory/
 In the schema, query analyzerfilter
 class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/filter
 class=solr.LowerCaseFilterFactory/


I find your index-time and query-time configuration very strange. Assuming
that you also have a lowercase filter, it seems that a token powerShot
will not be split and indexed as powershot. Then during query, both
power and shot will match nothing.

I suggest you start with the configuration given in the example schema.
Else, it'd be easier for us if you can help us understand the reasons behind
changing these parameters.

-- 
Regards,
Shalin Shekhar Mangar.


capitalization and delimiters

2009-10-12 Thread Audrey Foo


In my search docs, I have content such as 'powershot' and 'powerShot'.
I would expect 'powerShot' would be searched as 'power', 'shot' and 
'powershot', so that results for all these are returned. Instead, only results 
for 'power' and 'shot' are returned.
Any suggestions?
In the schema, index analyzer:filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 generateNumberParts=0 catenateWords=1 
catenateNumbers=1 catenateAll=0/filter 
class=solr.LowerCaseFilterFactory/
In the schema, query analyzerfilter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/filter 
class=solr.LowerCaseFilterFactory/
ThanksAudrey  
_
New! Open Messenger faster on the MSN homepage
http://go.microsoft.com/?linkid=9677405