Re: Solr schema filters
: For this exact example, use the WordDelimiterFilter exactly as : configured in the text fieldType in the example schema that ships : with solr. The trick is to then use some slop when querying. : : FT-50-43 will be indexed as FT, 50, 43 / 5043 (the last two tokens : are in the same position). : Now when querying, FT-5043 won't match without slop because there is : a 50 token in the middle of the indexed terms... so try FT-5043~1 FYI: this was the motivation for the qs param on dismax ... http://localhost:8983/solr/select?debugQuery=trueqt=dismaxpf=qf=textq=FT-5043qs=3 -Hoss
Re: Solr schema filters
On Jan 10, 2008 8:51 PM, Brian Artiaco [EMAIL PROTECTED] wrote: I'm kinda under the gun for this problem, and I thought that I would be able to solve this problem using the different Tokenizers and Query Analyzers that come with Solr, but I seem to be running into a brick wall. I'm currently using Acts_as_solr 0.9. So my requirements of my project is this: I need to configure my solr server so that when I have this field indexed : sku_name_t: FT-50-43 that it will show up as a valid result for the following queries: FT, 50, 43, FT5043, FT50-43, and FT-5043 For this exact example, use the WordDelimiterFilter exactly as configured in the text fieldType in the example schema that ships with solr. The trick is to then use some slop when querying. FT-50-43 will be indexed as FT, 50, 43 / 5043 (the last two tokens are in the same position). Now when querying, FT-5043 won't match without slop because there is a 50 token in the middle of the indexed terms... so try FT-5043~1 -Yonik The basic goal behind this requirement is that many people see these part number's in hobby magazines, and that when they search for the part, many times they will put in incorrect dashes, or no dash at all, etc, but they will usually at least have the letters/numbers correct. With the schema as it is, all of the queries work, EXCEPT for FT-5043 and FT5043. Looking at solr's documentation here (http://wiki.apache.org/solr/ AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089) I believe that properly changing the parameters in the solr.WordDelimiterFilterFactory tokenizer/analyzer fields in schema.xml should provide the results I need. As near as I can tell, in the schema.xml line 55 (I'm using AAS 0.9, if it matters): codefilter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=1//code The default is for catenateAll=0. My understanding from reading the solr docs (please correct me if I'm wrong. Is that catenateAll on FT-50-43 should result in an index of FT5043 in addition to the other options (I believe this is referred to as Index Expansion). And when I use the solr admin analyzer tool, it appears to do that, but the find_by_solr query for FT5043 still doesn't return any results. I've also tried playing around with the analyzer on line 64: codefilter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=1//code I've been banging my head against the wall, trying to come up with the magic combination that will provide me with the results I need, and I would greatly appreciate some feedback. Or if there's a better solr filter out there that someone can point me in the direction of, it would be greatly appreciated. I'm going to try and post this to one of the solr lists too, and if I get a solution there, I'll be sure to share it with you guys. Brian Artiaco Blue Hill Solutions
Solr schema filters
I'm kinda under the gun for this problem, and I thought that I would be able to solve this problem using the different Tokenizers and Query Analyzers that come with Solr, but I seem to be running into a brick wall. I'm currently using Acts_as_solr 0.9. So my requirements of my project is this: I need to configure my solr server so that when I have this field indexed : sku_name_t: FT-50-43 that it will show up as a valid result for the following queries: FT, 50, 43, FT5043, FT50-43, and FT-5043 The basic goal behind this requirement is that many people see these part number's in hobby magazines, and that when they search for the part, many times they will put in incorrect dashes, or no dash at all, etc, but they will usually at least have the letters/numbers correct. With the schema as it is, all of the queries work, EXCEPT for FT-5043 and FT5043. Looking at solr's documentation here (http://wiki.apache.org/solr/ AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089) I believe that properly changing the parameters in the solr.WordDelimiterFilterFactory tokenizer/analyzer fields in schema.xml should provide the results I need. As near as I can tell, in the schema.xml line 55 (I'm using AAS 0.9, if it matters): codefilter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=1//code The default is for catenateAll=0. My understanding from reading the solr docs (please correct me if I'm wrong. Is that catenateAll on FT-50-43 should result in an index of FT5043 in addition to the other options (I believe this is referred to as Index Expansion). And when I use the solr admin analyzer tool, it appears to do that, but the find_by_solr query for FT5043 still doesn't return any results. I've also tried playing around with the analyzer on line 64: codefilter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=1//code I've been banging my head against the wall, trying to come up with the magic combination that will provide me with the results I need, and I would greatly appreciate some feedback. Or if there's a better solr filter out there that someone can point me in the direction of, it would be greatly appreciated. I'm going to try and post this to one of the solr lists too, and if I get a solution there, I'll be sure to share it with you guys. Brian Artiaco Blue Hill Solutions