Re: Parse eDisMax queries for keywords
Hi Jack, thanks for your reply. Ok in this case I agree that enriching the query in the application layer is a good idea. We are still a bit puzzled how the enriched query should look like. I'll post here when we found a solution. If somebody has suggestions, I'd be happy to hear them. Mirko 2013/11/21 Jack Krupansky j...@basetechnology.com The query parser does its own tokenization and parsing before your analyzer tokenizer and filters are called, assuring that only one white space-delimited token is analyzed at a time. You're probably best off having an application layer preprocessor for the query that enriches the query in the manner that you're describing. Or, simply settle for a heuristic approach that may give you 70% of what you want using only existing Solr features on the server side. -- Jack Krupansky -Original Message- From: Mirko Sent: Thursday, November 21, 2013 5:30 AM To: solr-user@lucene.apache.org Subject: Parse eDisMax queries for keywords Hi, We would like to implement special handling for queries that contain certain keywords. Our particular use case: In the example query Footitle season 1 we want to discover the keywords season , get the subsequent number, and boost (or filter for) documents that match 1 on field name=season. We have two fields in our schema: !-- titles contains titles -- field name=title type=text indexed=true stored=true multiValued=false/ fieldType name=text class=solr.TextField omitNorms=true analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- ... -- /analyzer /fieldType field name=season type=season_number indexed=true stored=false multiValued=false/ !-- season contains season numbers -- fieldType name=season_number class=solr.TextField omitNorms=true analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=.*(?:season) *0*([0-9]+).* replacement=$1/ /analyzer /fieldType Our idea was to use a Keyword tokenizer and a Regex on the season field to extract the season number from the complete query. However, we use a ExtendedDisMax query parser in our search handler: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=qf title season /str /lst /requestHandler The problem is that the eDisMax tokenizes the query, so that our field season receives the tokens [Foo, season, 1] without any order, instead of the complete query. How can we pass the complete query (untokenized) to the season field? We don't understand which tokenizer is used here and why our season field received tokens instead of the complete query. Or is there another approach to solve this use case with Solr? Thanks, Mirko
Parse eDisMax queries for keywords
Hi, We would like to implement special handling for queries that contain certain keywords. Our particular use case: In the example query Footitle season 1 we want to discover the keywords season , get the subsequent number, and boost (or filter for) documents that match 1 on field name=season. We have two fields in our schema: !-- titles contains titles -- field name=title type=text indexed=true stored=true multiValued=false/ fieldType name=text class=solr.TextField omitNorms=true analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- ... -- /analyzer /fieldType field name=season type=season_number indexed=true stored=false multiValued=false/ !-- season contains season numbers -- fieldType name=season_number class=solr.TextField omitNorms=true analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=.*(?:season) *0*([0-9]+).* replacement=$1/ /analyzer /fieldType Our idea was to use a Keyword tokenizer and a Regex on the season field to extract the season number from the complete query. However, we use a ExtendedDisMax query parser in our search handler: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=qf title season /str /lst /requestHandler The problem is that the eDisMax tokenizes the query, so that our field season receives the tokens [Foo, season, 1] without any order, instead of the complete query. How can we pass the complete query (untokenized) to the season field? We don't understand which tokenizer is used here and why our season field received tokens instead of the complete query. Or is there another approach to solve this use case with Solr? Thanks, Mirko
Re: Parse eDisMax queries for keywords
The query parser does its own tokenization and parsing before your analyzer tokenizer and filters are called, assuring that only one white space-delimited token is analyzed at a time. You're probably best off having an application layer preprocessor for the query that enriches the query in the manner that you're describing. Or, simply settle for a heuristic approach that may give you 70% of what you want using only existing Solr features on the server side. -- Jack Krupansky -Original Message- From: Mirko Sent: Thursday, November 21, 2013 5:30 AM To: solr-user@lucene.apache.org Subject: Parse eDisMax queries for keywords Hi, We would like to implement special handling for queries that contain certain keywords. Our particular use case: In the example query Footitle season 1 we want to discover the keywords season , get the subsequent number, and boost (or filter for) documents that match 1 on field name=season. We have two fields in our schema: !-- titles contains titles -- field name=title type=text indexed=true stored=true multiValued=false/ fieldType name=text class=solr.TextField omitNorms=true analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- ... -- /analyzer /fieldType field name=season type=season_number indexed=true stored=false multiValued=false/ !-- season contains season numbers -- fieldType name=season_number class=solr.TextField omitNorms=true analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=.*(?:season) *0*([0-9]+).* replacement=$1/ /analyzer /fieldType Our idea was to use a Keyword tokenizer and a Regex on the season field to extract the season number from the complete query. However, we use a ExtendedDisMax query parser in our search handler: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=qf title season /str /lst /requestHandler The problem is that the eDisMax tokenizes the query, so that our field season receives the tokens [Foo, season, 1] without any order, instead of the complete query. How can we pass the complete query (untokenized) to the season field? We don't understand which tokenizer is used here and why our season field received tokens instead of the complete query. Or is there another approach to solve this use case with Solr? Thanks, Mirko