Re: parsing strings into phrase queries

2010-02-22 Thread Lance Norskog
Thanks Robert, that helped.

On Thu, Feb 18, 2010 at 5:48 AM, Robert Muir rcm...@gmail.com wrote:
 i gave it a rough shot Lance, if there's a better way to explain it, please
 edit

 On Wed, Feb 17, 2010 at 10:23 PM, Lance Norskog goks...@gmail.com wrote:

 That would be great. After reading this and the PositionFilter class I
 still don't know how to use it.

 On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir rcm...@gmail.com wrote:
  i think we can improve the docs/wiki to show this example use case, i
  noticed the wiki explanation for this filter gives a more complex
 shingles
  example, which is interesting, but this seems to be a common problem and
  maybe we should add this use case.
 
  On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter
  hossman_luc...@fucit.orgwrote:
 
 
  : take a look at PositionFilter
 
  Right, there was another thread recently where almost the exact same
 issue
  was discussed...
 
  http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html
 
  ..except that i was ignorant of the existence of PositionFilter when i
  wrote that message.
 
 
 
  -Hoss
 
 
 
 
  --
  Robert Muir
  rcm...@gmail.com
 



 --
 Lance Norskog
 goks...@gmail.com




 --
 Robert Muir
 rcm...@gmail.com




-- 
Lance Norskog
goks...@gmail.com


Re: parsing strings into phrase queries

2010-02-18 Thread Robert Muir
i gave it a rough shot Lance, if there's a better way to explain it, please
edit

On Wed, Feb 17, 2010 at 10:23 PM, Lance Norskog goks...@gmail.com wrote:

 That would be great. After reading this and the PositionFilter class I
 still don't know how to use it.

 On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir rcm...@gmail.com wrote:
  i think we can improve the docs/wiki to show this example use case, i
  noticed the wiki explanation for this filter gives a more complex
 shingles
  example, which is interesting, but this seems to be a common problem and
  maybe we should add this use case.
 
  On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter
  hossman_luc...@fucit.orgwrote:
 
 
  : take a look at PositionFilter
 
  Right, there was another thread recently where almost the exact same
 issue
  was discussed...
 
  http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html
 
  ..except that i was ignorant of the existence of PositionFilter when i
  wrote that message.
 
 
 
  -Hoss
 
 
 
 
  --
  Robert Muir
  rcm...@gmail.com
 



 --
 Lance Norskog
 goks...@gmail.com




-- 
Robert Muir
rcm...@gmail.com


Re: parsing strings into phrase queries

2010-02-18 Thread Kevin Osborn
The PositionFilter worked great for my purpose along with another filter that I 
build.

In my case, my indexed data may be something like X150. So, a query for 
Nokia X150 should match. But I don't want random matches on x. However, if 
my indexed data is G7, I do want a query on PowerShot G7 to match on g 
and 7. So, a simple length filter will not do. Instead I build a custom 
filter (that I am willing to contribute back) that filters out singletons that 
are surrounded by longer tokens (3 or more by default). So, PowerShot G7 
becomes power shot g 7, but Nokia X150 becomes nokia 150.

And then I put the results of this into a PositionFilter. This allows Nokia 
X150ABC to match against the X150 part. So far I really like this for 
partial part number searches. And then to boost exact matches, I used copyField 
to create another field without PositionFilter. And then did an optional phrase 
query on that.





From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, February 17, 2010 7:23:23 PM
Subject: Re: parsing strings into phrase queries

That would be great. After reading this and the PositionFilter class I
still don't know how to use it.

On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir rcm...@gmail.com wrote:
 i think we can improve the docs/wiki to show this example use case, i
 noticed the wiki explanation for this filter gives a more complex shingles
 example, which is interesting, but this seems to be a common problem and
 maybe we should add this use case.

 On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:


 : take a look at PositionFilter

 Right, there was another thread recently where almost the exact same issue
 was discussed...

 http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html

 ..except that i was ignorant of the existence of PositionFilter when i
 wrote that message.



 -Hoss




 --
 Robert Muir
 rcm...@gmail.com




-- 
Lance Norskog
goks...@gmail.com



  

Re: parsing strings into phrase queries

2010-02-18 Thread Otis Gospodnetic
This sounds useful to me!
Here's a pointer: http://wiki.apache.org/solr/HowToContribute


Thanks!
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/





From: Kevin Osborn osbo...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Thu, February 18, 2010 1:15:11 PM
Subject: Re: parsing strings into phrase queries

The PositionFilter worked great for my purpose along with another filter that I 
build.

In my case, my indexed data may be something like X150. So, a query for 
Nokia X150 should match. But I don't want random matches on x. However, if 
my indexed data is G7, I do want a query on PowerShot G7 to match on g 
and 7. So, a simple length filter will not do. Instead I build a custom 
filter (that I am willing to contribute back) that filters out singletons that 
are surrounded by longer tokens (3 or more by default). So, PowerShot G7 
becomes power shot g 7, but Nokia X150 becomes nokia 150.

And then I put the results of this into a PositionFilter. This allows Nokia 
X150ABC to match against the X150 part. So far I really like this for 
partial part number searches. And then to boost exact matches, I used copyField 
to create another field without PositionFilter. And then did an optional phrase 
query on that.





From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, February 17, 2010 7:23:23 PM
Subject: Re: parsing strings into phrase queries

That would be great. After reading this and the PositionFilter class I
still don't know how to use it.

On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir rcm...@gmail.com wrote:
 i think we can improve the docs/wiki to show this example use case, i
 noticed the wiki explanation for this filter gives a more complex shingles
 example, which is interesting, but this seems to be a common problem and
 maybe we should add this use case.

 On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:


 : take a look at PositionFilter

 Right, there was another thread recently where almost the exact same issue
 was discussed...

 http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html

 ..except that i was ignorant of the existence of PositionFilter when i
 wrote that message.



 -Hoss




 --
 Robert Muir
 rcm...@gmail.com




-- 
Lance Norskog
goks...@gmail.com

Re: parsing strings into phrase queries

2010-02-17 Thread Chris Hostetter

: take a look at PositionFilter

Right, there was another thread recently where almost the exact same issue 
was discussed...

http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html

..except that i was ignorant of the existence of PositionFilter when i 
wrote that message.



-Hoss



Re: parsing strings into phrase queries

2010-02-17 Thread Robert Muir
i think we can improve the docs/wiki to show this example use case, i
noticed the wiki explanation for this filter gives a more complex shingles
example, which is interesting, but this seems to be a common problem and
maybe we should add this use case.

On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : take a look at PositionFilter

 Right, there was another thread recently where almost the exact same issue
 was discussed...

 http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html

 ..except that i was ignorant of the existence of PositionFilter when i
 wrote that message.



 -Hoss




-- 
Robert Muir
rcm...@gmail.com


Re: parsing strings into phrase queries

2010-02-17 Thread Lance Norskog
That would be great. After reading this and the PositionFilter class I
still don't know how to use it.

On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir rcm...@gmail.com wrote:
 i think we can improve the docs/wiki to show this example use case, i
 noticed the wiki explanation for this filter gives a more complex shingles
 example, which is interesting, but this seems to be a common problem and
 maybe we should add this use case.

 On Wed, Feb 17, 2010 at 1:54 PM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:


 : take a look at PositionFilter

 Right, there was another thread recently where almost the exact same issue
 was discussed...

 http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html

 ..except that i was ignorant of the existence of PositionFilter when i
 wrote that message.



 -Hoss




 --
 Robert Muir
 rcm...@gmail.com




-- 
Lance Norskog
goks...@gmail.com


Re: parsing strings into phrase queries

2010-02-13 Thread Erick Erickson
I don't see a good way to fix this without some heuristic you'd have to
implement to munge your query. There's no good for SOLR to intuit that
what you want is a partial match in this case. If you can create some
rules like remove any single letters after numbers in the query
that would be good enough, you might get something satisfactory.
But I don't know of a way for SOLR to do this for you

And if you *can't* write a good enough rule, this sounds like an
intractable problem.

Not much help I know

On Sat, Feb 13, 2010 at 1:13 AM, Kevin Osborn osbo...@yahoo.com wrote:

 Right now if I have the query model:(Nokia BH-212V), the parser turns this
 into +(model:nokia model:bh 212 v). The problem is that I might have a
 model called Nokia BH-212, so this is completely missed. In my case, I would
 like my query to be +(model:nokia model:bh model:212 model:v).

 This is my schema for the field:

fieldType name=text class=solr.TextField positionIncrementGap=100
 
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=1 /
filter class=solr.LowerCaseFilterFactory /
filter
 class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory synonyms=query_synonyms.txt
 ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1
 catenateWords=0 catenateNumbers=0 catenateAll=0 /
filter class=solr.LowerCaseFilterFactory /
filter
 class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
/fieldType






parsing strings into phrase queries

2010-02-12 Thread Kevin Osborn
Right now if I have the query model:(Nokia BH-212V), the parser turns this into 
+(model:nokia model:bh 212 v). The problem is that I might have a model 
called Nokia BH-212, so this is completely missed. In my case, I would like my 
query to be +(model:nokia model:bh model:212 model:v).

This is my schema for the field:

fieldType name=text class=solr.TextField positionIncrementGap=100 
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=1 /
filter class=solr.LowerCaseFilterFactory /
filter 
class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory / 
filter class=solr.SynonymFilterFactory synonyms=query_synonyms.txt 
ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt / 
filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 / 
filter class=solr.LowerCaseFilterFactory / 
filter 
class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory / 
  /analyzer
/fieldType