Re: Parse eDisMax queries for keywords

2013-11-25 Thread Mirko
Hi Jack,
thanks for your reply. Ok in this case I agree that enriching the query
in the application layer is a good idea. We are still a bit puzzled how the
enriched query should look like. I'll post here when we found a solution.
If somebody has suggestions, I'd be happy to hear them.

Mirko


2013/11/21 Jack Krupansky j...@basetechnology.com

 The query parser does its own tokenization and parsing before your
 analyzer tokenizer and filters are called, assuring that only one white
 space-delimited token is analyzed at a time.

 You're probably best off having an application layer preprocessor for the
 query that enriches the query in the manner that you're describing.

 Or, simply settle for a heuristic approach that may give you 70% of what
 you want using only existing Solr features on the server side.

 -- Jack Krupansky

 -Original Message- From: Mirko
 Sent: Thursday, November 21, 2013 5:30 AM
 To: solr-user@lucene.apache.org
 Subject: Parse eDisMax queries for keywords


 Hi,
 We would like to implement special handling for queries that contain
 certain keywords. Our particular use case:

 In the example query Footitle season 1 we want to discover the keywords
 season , get the subsequent number, and boost (or filter for) documents
 that match 1 on field name=season.

 We have two fields in our schema:

 !-- titles contains titles --
 field name=title type=text indexed=true stored=true
 multiValued=false/

 fieldType name=text class=solr.TextField omitNorms=true
analyzer 
charFilter class=solr.MappingCharFilterFactory
 mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
!-- ... --
/analyzer
 /fieldType

 field name=season type=season_number indexed=true stored=false
 multiValued=false/

 !-- season contains season numbers --
 fieldType name=season_number class=solr.TextField omitNorms=true 
 analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PatternReplaceFilterFactory pattern=.*(?:season)
 *0*([0-9]+).* replacement=$1/
/analyzer
 /fieldType


 Our idea was to use a Keyword tokenizer and a Regex on the season field
 to extract the season number from the complete query.

 However, we use a ExtendedDisMax query parser in our search handler:

 requestHandler name=/select class=solr.SearchHandler
lst name=defaults
str name=defTypeedismax/str
str name=qf
title season
/str

/lst
 /requestHandler


 The problem is that the eDisMax tokenizes the query, so that our field
 season receives the tokens [Foo, season, 1] without any order,
 instead of the complete query.

 How can we pass the complete query (untokenized) to the season field? We
 don't understand which tokenizer is used here and why our season field
 received tokens instead of the complete query.

 Or is there another approach to solve this use case with Solr?

 Thanks,
 Mirko



Parse eDisMax queries for keywords

2013-11-21 Thread Mirko
Hi,
We would like to implement special handling for queries that contain
certain keywords. Our particular use case:

In the example query Footitle season 1 we want to discover the keywords
season , get the subsequent number, and boost (or filter for) documents
that match 1 on field name=season.

We have two fields in our schema:

!-- titles contains titles --
field name=title type=text indexed=true stored=true
 multiValued=false/

fieldType name=text class=solr.TextField omitNorms=true
analyzer 
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
!-- ... --
/analyzer
/fieldType

field name=season type=season_number indexed=true stored=false
multiValued=false/

!-- season contains season numbers --
fieldType name=season_number class=solr.TextField omitNorms=true 
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=.*(?:season)
*0*([0-9]+).* replacement=$1/
/analyzer
/fieldType


Our idea was to use a Keyword tokenizer and a Regex on the season field
to extract the season number from the complete query.

However, we use a ExtendedDisMax query parser in our search handler:

requestHandler name=/select class=solr.SearchHandler
lst name=defaults
str name=defTypeedismax/str
str name=qf
title season
/str

/lst
/requestHandler


The problem is that the eDisMax tokenizes the query, so that our field
season receives the tokens [Foo, season, 1] without any order,
instead of the complete query.

How can we pass the complete query (untokenized) to the season field? We
don't understand which tokenizer is used here and why our season field
received tokens instead of the complete query.

Or is there another approach to solve this use case with Solr?

Thanks,
Mirko


Re: Parse eDisMax queries for keywords

2013-11-21 Thread Jack Krupansky
The query parser does its own tokenization and parsing before your analyzer 
tokenizer and filters are called, assuring that only one white 
space-delimited token is analyzed at a time.


You're probably best off having an application layer preprocessor for the 
query that enriches the query in the manner that you're describing.


Or, simply settle for a heuristic approach that may give you 70% of what 
you want using only existing Solr features on the server side.


-- Jack Krupansky

-Original Message- 
From: Mirko

Sent: Thursday, November 21, 2013 5:30 AM
To: solr-user@lucene.apache.org
Subject: Parse eDisMax queries for keywords

Hi,
We would like to implement special handling for queries that contain
certain keywords. Our particular use case:

In the example query Footitle season 1 we want to discover the keywords
season , get the subsequent number, and boost (or filter for) documents
that match 1 on field name=season.

We have two fields in our schema:

!-- titles contains titles --
field name=title type=text indexed=true stored=true
multiValued=false/

fieldType name=text class=solr.TextField omitNorms=true
   analyzer 
   charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   !-- ... --
   /analyzer
/fieldType

field name=season type=season_number indexed=true stored=false
multiValued=false/

!-- season contains season numbers --
fieldType name=season_number class=solr.TextField omitNorms=true 
analyzer type=query
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=.*(?:season)
*0*([0-9]+).* replacement=$1/
   /analyzer
/fieldType


Our idea was to use a Keyword tokenizer and a Regex on the season field
to extract the season number from the complete query.

However, we use a ExtendedDisMax query parser in our search handler:

requestHandler name=/select class=solr.SearchHandler
   lst name=defaults
   str name=defTypeedismax/str
   str name=qf
   title season
   /str

   /lst
/requestHandler


The problem is that the eDisMax tokenizes the query, so that our field
season receives the tokens [Foo, season, 1] without any order,
instead of the complete query.

How can we pass the complete query (untokenized) to the season field? We
don't understand which tokenizer is used here and why our season field
received tokens instead of the complete query.

Or is there another approach to solve this use case with Solr?

Thanks,
Mirko