Re: Solr schema filters

2008-01-16 Thread Chris Hostetter
: For this exact example, use the WordDelimiterFilter exactly as
: configured in the text fieldType in the example schema that ships
: with solr.  The trick is to then use some slop when querying.
: 
: FT-50-43 will be indexed as FT, 50, 43 / 5043  (the last two tokens
: are in the same position).
: Now when querying, FT-5043 won't match without slop because there is
: a 50 token in the middle of the indexed terms... so try FT-5043~1

FYI: this was the motivation for the qs param on dismax ... 

http://localhost:8983/solr/select?debugQuery=trueqt=dismaxpf=qf=textq=FT-5043qs=3


-Hoss



Re: Solr schema filters

2008-01-11 Thread Yonik Seeley
On Jan 10, 2008 8:51 PM, Brian Artiaco [EMAIL PROTECTED] wrote:
 I'm kinda under the gun for this problem, and I thought that I would
 be able to solve this problem using the different Tokenizers and Query
 Analyzers that come with Solr, but I seem to be running into a brick
 wall.

 I'm currently using Acts_as_solr 0.9.

 So my requirements of my project is this: I need to configure my
 solr server so that when I have this field indexed : sku_name_t:
 FT-50-43
 that it will show up as a valid result for the following queries:
 FT, 50, 43, FT5043, FT50-43, and FT-5043

For this exact example, use the WordDelimiterFilter exactly as
configured in the text fieldType in the example schema that ships
with solr.  The trick is to then use some slop when querying.

FT-50-43 will be indexed as FT, 50, 43 / 5043  (the last two tokens
are in the same position).
Now when querying, FT-5043 won't match without slop because there is
a 50 token in the middle of the indexed terms... so try FT-5043~1

-Yonik



 The basic goal behind this requirement is that many people see these
 part number's in hobby magazines, and that when they search for the
 part, many times they will put in incorrect dashes, or no dash at all,
 etc, but they will usually at least have the letters/numbers correct.

 With the schema as it is, all of the queries work, EXCEPT for
 FT-5043 and FT5043.

 Looking at solr's documentation here (http://wiki.apache.org/solr/
 AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089)
 I believe that properly changing the parameters in the
 solr.WordDelimiterFilterFactory tokenizer/analyzer fields in
 schema.xml should provide the results I need.

 As near as I can tell, in the schema.xml line 55 (I'm using AAS 0.9,
 if it matters):
 codefilter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=1//code

 The default is for catenateAll=0.  My understanding from reading the
 solr docs (please correct me if I'm wrong.  Is that catenateAll on
 FT-50-43 should result in an index of FT5043 in addition to the
 other options (I believe this is referred to as Index Expansion).  And
 when I use the solr admin analyzer tool, it appears to do that, but
 the find_by_solr query for FT5043 still doesn't return any results.

 I've also tried playing around with the analyzer on line 64:
 codefilter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=1//code

 I've been  banging my head against the wall, trying to come up with
 the magic combination that will provide me with the results I need,
 and I would greatly appreciate some feedback.   Or if there's a better
 solr filter out there that someone can point me in the direction of,
 it would be greatly appreciated.  I'm going to try and post this to
 one of the solr lists too, and if I get a solution there, I'll be sure
 to share it with you guys.

 Brian Artiaco
 Blue Hill Solutions



Solr schema filters

2008-01-10 Thread Brian Artiaco
I'm kinda under the gun for this problem, and I thought that I would
be able to solve this problem using the different Tokenizers and Query
Analyzers that come with Solr, but I seem to be running into a brick
wall.

I'm currently using Acts_as_solr 0.9.

So my requirements of my project is this: I need to configure my
solr server so that when I have this field indexed : sku_name_t:
FT-50-43
that it will show up as a valid result for the following queries:
FT, 50, 43, FT5043, FT50-43, and FT-5043

The basic goal behind this requirement is that many people see these
part number's in hobby magazines, and that when they search for the
part, many times they will put in incorrect dashes, or no dash at all,
etc, but they will usually at least have the letters/numbers correct.

With the schema as it is, all of the queries work, EXCEPT for
FT-5043 and FT5043.

Looking at solr's documentation here (http://wiki.apache.org/solr/
AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089)
I believe that properly changing the parameters in the
solr.WordDelimiterFilterFactory tokenizer/analyzer fields in
schema.xml should provide the results I need.

As near as I can tell, in the schema.xml line 55 (I'm using AAS 0.9,
if it matters):
codefilter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=1//code

The default is for catenateAll=0.  My understanding from reading the
solr docs (please correct me if I'm wrong.  Is that catenateAll on
FT-50-43 should result in an index of FT5043 in addition to the
other options (I believe this is referred to as Index Expansion).  And
when I use the solr admin analyzer tool, it appears to do that, but
the find_by_solr query for FT5043 still doesn't return any results.

I've also tried playing around with the analyzer on line 64:
codefilter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=1//code

I've been  banging my head against the wall, trying to come up with
the magic combination that will provide me with the results I need,
and I would greatly appreciate some feedback.   Or if there's a better
solr filter out there that someone can point me in the direction of,
it would be greatly appreciated.  I'm going to try and post this to
one of the solr lists too, and if I get a solution there, I'll be sure
to share it with you guys.

Brian Artiaco
Blue Hill Solutions