Re: parsing strings into phrase queries
Thanks Robert, that helped. On Thu, Feb 18, 2010 at 5:48 AM, Robert Muir rcm...@gmail.com wrote: i gave it a rough shot Lance, if there's a better way to explain it, please edit On Wed, Feb 17, 2010 at 10:23 PM, Lance Norskog goks...@gmail.com wrote: That would be great. After reading this and the PositionFilter class I still don't know how to use it. On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir rcm...@gmail.com wrote: i think we can improve the docs/wiki to show this example use case, i noticed the wiki explanation for this filter gives a more complex shingles example, which is interesting, but this seems to be a common problem and maybe we should add this use case. On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : take a look at PositionFilter Right, there was another thread recently where almost the exact same issue was discussed... http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html ..except that i was ignorant of the existence of PositionFilter when i wrote that message. -Hoss -- Robert Muir rcm...@gmail.com -- Lance Norskog goks...@gmail.com -- Robert Muir rcm...@gmail.com -- Lance Norskog goks...@gmail.com
Re: parsing strings into phrase queries
i gave it a rough shot Lance, if there's a better way to explain it, please edit On Wed, Feb 17, 2010 at 10:23 PM, Lance Norskog goks...@gmail.com wrote: That would be great. After reading this and the PositionFilter class I still don't know how to use it. On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir rcm...@gmail.com wrote: i think we can improve the docs/wiki to show this example use case, i noticed the wiki explanation for this filter gives a more complex shingles example, which is interesting, but this seems to be a common problem and maybe we should add this use case. On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : take a look at PositionFilter Right, there was another thread recently where almost the exact same issue was discussed... http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html ..except that i was ignorant of the existence of PositionFilter when i wrote that message. -Hoss -- Robert Muir rcm...@gmail.com -- Lance Norskog goks...@gmail.com -- Robert Muir rcm...@gmail.com
Re: parsing strings into phrase queries
The PositionFilter worked great for my purpose along with another filter that I build. In my case, my indexed data may be something like X150. So, a query for Nokia X150 should match. But I don't want random matches on x. However, if my indexed data is G7, I do want a query on PowerShot G7 to match on g and 7. So, a simple length filter will not do. Instead I build a custom filter (that I am willing to contribute back) that filters out singletons that are surrounded by longer tokens (3 or more by default). So, PowerShot G7 becomes power shot g 7, but Nokia X150 becomes nokia 150. And then I put the results of this into a PositionFilter. This allows Nokia X150ABC to match against the X150 part. So far I really like this for partial part number searches. And then to boost exact matches, I used copyField to create another field without PositionFilter. And then did an optional phrase query on that. From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, February 17, 2010 7:23:23 PM Subject: Re: parsing strings into phrase queries That would be great. After reading this and the PositionFilter class I still don't know how to use it. On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir rcm...@gmail.com wrote: i think we can improve the docs/wiki to show this example use case, i noticed the wiki explanation for this filter gives a more complex shingles example, which is interesting, but this seems to be a common problem and maybe we should add this use case. On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : take a look at PositionFilter Right, there was another thread recently where almost the exact same issue was discussed... http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html ..except that i was ignorant of the existence of PositionFilter when i wrote that message. -Hoss -- Robert Muir rcm...@gmail.com -- Lance Norskog goks...@gmail.com
Re: parsing strings into phrase queries
This sounds useful to me! Here's a pointer: http://wiki.apache.org/solr/HowToContribute Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ From: Kevin Osborn osbo...@yahoo.com To: solr-user@lucene.apache.org Sent: Thu, February 18, 2010 1:15:11 PM Subject: Re: parsing strings into phrase queries The PositionFilter worked great for my purpose along with another filter that I build. In my case, my indexed data may be something like X150. So, a query for Nokia X150 should match. But I don't want random matches on x. However, if my indexed data is G7, I do want a query on PowerShot G7 to match on g and 7. So, a simple length filter will not do. Instead I build a custom filter (that I am willing to contribute back) that filters out singletons that are surrounded by longer tokens (3 or more by default). So, PowerShot G7 becomes power shot g 7, but Nokia X150 becomes nokia 150. And then I put the results of this into a PositionFilter. This allows Nokia X150ABC to match against the X150 part. So far I really like this for partial part number searches. And then to boost exact matches, I used copyField to create another field without PositionFilter. And then did an optional phrase query on that. From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, February 17, 2010 7:23:23 PM Subject: Re: parsing strings into phrase queries That would be great. After reading this and the PositionFilter class I still don't know how to use it. On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir rcm...@gmail.com wrote: i think we can improve the docs/wiki to show this example use case, i noticed the wiki explanation for this filter gives a more complex shingles example, which is interesting, but this seems to be a common problem and maybe we should add this use case. On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : take a look at PositionFilter Right, there was another thread recently where almost the exact same issue was discussed... http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html ..except that i was ignorant of the existence of PositionFilter when i wrote that message. -Hoss -- Robert Muir rcm...@gmail.com -- Lance Norskog goks...@gmail.com
Re: parsing strings into phrase queries
: take a look at PositionFilter Right, there was another thread recently where almost the exact same issue was discussed... http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html ..except that i was ignorant of the existence of PositionFilter when i wrote that message. -Hoss
Re: parsing strings into phrase queries
i think we can improve the docs/wiki to show this example use case, i noticed the wiki explanation for this filter gives a more complex shingles example, which is interesting, but this seems to be a common problem and maybe we should add this use case. On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : take a look at PositionFilter Right, there was another thread recently where almost the exact same issue was discussed... http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html ..except that i was ignorant of the existence of PositionFilter when i wrote that message. -Hoss -- Robert Muir rcm...@gmail.com
Re: parsing strings into phrase queries
That would be great. After reading this and the PositionFilter class I still don't know how to use it. On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir rcm...@gmail.com wrote: i think we can improve the docs/wiki to show this example use case, i noticed the wiki explanation for this filter gives a more complex shingles example, which is interesting, but this seems to be a common problem and maybe we should add this use case. On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : take a look at PositionFilter Right, there was another thread recently where almost the exact same issue was discussed... http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html ..except that i was ignorant of the existence of PositionFilter when i wrote that message. -Hoss -- Robert Muir rcm...@gmail.com -- Lance Norskog goks...@gmail.com
Re: parsing strings into phrase queries
I don't see a good way to fix this without some heuristic you'd have to implement to munge your query. There's no good for SOLR to intuit that what you want is a partial match in this case. If you can create some rules like remove any single letters after numbers in the query that would be good enough, you might get something satisfactory. But I don't know of a way for SOLR to do this for you And if you *can't* write a good enough rule, this sounds like an intractable problem. Not much help I know On Sat, Feb 13, 2010 at 1:13 AM, Kevin Osborn osbo...@yahoo.com wrote: Right now if I have the query model:(Nokia BH-212V), the parser turns this into +(model:nokia model:bh 212 v). The problem is that I might have a model called Nokia BH-212, so this is completely missed. In my case, I would like my query to be +(model:nokia model:bh model:212 model:v). This is my schema for the field: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=query_synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType
parsing strings into phrase queries
Right now if I have the query model:(Nokia BH-212V), the parser turns this into +(model:nokia model:bh 212 v). The problem is that I might have a model called Nokia BH-212, so this is completely missed. In my case, I would like my query to be +(model:nokia model:bh model:212 model:v). This is my schema for the field: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=query_synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / filter class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType