RE: multi term analyzer error
Hi Erick, Thanks for the detailed response! My use case is exactly as you described in your 'Eric*' example. Our index and query analyzers replace a dash "-" with an underscore "_". So when a user tries to search for something that has a dash in it, and the query has a wildcard (for example Eyal-Naa*), he doesn't find anything even if the term exists. The reason is, as you said, that solr uses a different analyzer to analyze wildcard queries. Our solution was to add a multiterm analyzer that will do the same thing as the query analyzer - replace the dash with an underscore. This does solve the issue, even though the ' PatternReplaceCharFilterFactory ' does not implement the 'MultiTermAwareComponent' interface. But adding the new analyzer causes a new problem, and I don't think it is related to the PatternReplaceCharFilterFactory. When an empty wildcard query is sent, such as just "*" to query the whole index, there is a failure with "analyzer returned no terms for multiTerm term *". These queries do work for the default analyzer so I guess there is a way to handle them. Thanks! Eyal Eyal Naamati Alma Developer Tel: +972-2-6499313 Mobile: +972-547915255 eyal.naam...@exlibrisgroup.com www.exlibrisgroup.com -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, December 30, 2015 6:42 PM To: solr-user Subject: Re: multi term analyzer error Right, you may be one of the few people to actually implement your own multiTerm analyzer function despite the fact that this has been in the code for years! If you look at the factories and see if they implement the "MultiTermAwareComponent" interface, and PatternReplaceCharFitlerFactory does _not_. Thus it can't be used in a multiTerm analysis chain. A bit of background here. The whole "MultiTermAwareComponent" was implemented to handle simple cases that were causing endless questions. For instance, anything with a wildcard would do no analysis. Thus people would define a field with, say, LowerCaseFilterFactory and then ask "Why don't we find 'Eric*' when Erick is in the field?" The answer was that "wildcard terms are not sent through the analysis chain, you have to do those kinds of transformations in the client." This was not terribly satisfactory... There are various sound reasons why "doing the right thing" with wildcards in a filter that breaks a single token into two or more tokens this is very hard in the general case. Any filter that generates two or more tokens is impossible to get right. Does this mean both tokens should be wildcards? The first? The second? Neither? Any decision is the wrong decision. And don't even get me started on something like Ngrams or Shingles. OK, finally answering your question. The only filters that are multi-term aware are ones that are _guaranteed_ to produce one and only one token from any input token. PatternReplaceCharFilterFactory cannot honor that contract so I'm pretty sure that's what's causing your error. Assuming the substitutions you're doing would work on the whole string, you might be able to use PatterhReplaceCharFilterFactory since that operates on the whole input string rather than the tokens and thus could be used. But I have to ask "why are you implementing a multiTerm analyzer"? What is the use-case you're trying to solve? Because from your example, it looks like you're trying to search over a string-type (untokenized) input and if so this not the right approach at all. Best, Erick On Tue, Dec 29, 2015 at 10:16 PM, Eyal Naamati <eyal.naam...@exlibrisgroup.com> wrote: > Hi Ahmet, > Yes there is a space in my example. > This is my multiterm analyzer: > > > pattern="\-" replacement="\_" /> > > > > Thanks! > > Eyal Naamati > Alma Developer > Tel: +972-2-6499313 > Mobile: +972-547915255 > eyal.naam...@exlibrisgroup.com > > www.exlibrisgroup.com > > -Original Message- > From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] > Sent: Tuesday, December 29, 2015 5:18 PM > To: solr-user@lucene.apache.org > Subject: Re: multi term analyzer error > > Hi Eyal, > > What is your analyzer definition for multi-term? > In your example, is star charter separated from the term by a space? > > > Ahmet > > On Tuesday, December 29, 2015 3:26 PM, Eyal Naamati > <eyal.naam...@exlibrisgroup.com> wrote: > > > > > Hi, > > I defined a multi-term analyzer to my analysis chain, and it works as I > expect. However, for some queries (for example '* or 'term *') I get an > exception "analyzer returned no terms for multiTerm term". These queries work > when I don't customize a multi-term analyzer. > My question: is there a w
Re: multi term analyzer error
Right, you may be one of the few people to actually implement your own multiTerm analyzer function despite the fact that this has been in the code for years! If you look at the factories and see if they implement the "MultiTermAwareComponent" interface, and PatternReplaceCharFitlerFactory does _not_. Thus it can't be used in a multiTerm analysis chain. A bit of background here. The whole "MultiTermAwareComponent" was implemented to handle simple cases that were causing endless questions. For instance, anything with a wildcard would do no analysis. Thus people would define a field with, say, LowerCaseFilterFactory and then ask "Why don't we find 'Eric*' when Erick is in the field?" The answer was that "wildcard terms are not sent through the analysis chain, you have to do those kinds of transformations in the client." This was not terribly satisfactory... There are various sound reasons why "doing the right thing" with wildcards in a filter that breaks a single token into two or more tokens this is very hard in the general case. Any filter that generates two or more tokens is impossible to get right. Does this mean both tokens should be wildcards? The first? The second? Neither? Any decision is the wrong decision. And don't even get me started on something like Ngrams or Shingles. OK, finally answering your question. The only filters that are multi-term aware are ones that are _guaranteed_ to produce one and only one token from any input token. PatternReplaceCharFilterFactory cannot honor that contract so I'm pretty sure that's what's causing your error. Assuming the substitutions you're doing would work on the whole string, you might be able to use PatterhReplaceCharFilterFactory since that operates on the whole input string rather than the tokens and thus could be used. But I have to ask "why are you implementing a multiTerm analyzer"? What is the use-case you're trying to solve? Because from your example, it looks like you're trying to search over a string-type (untokenized) input and if so this not the right approach at all. Best, Erick On Tue, Dec 29, 2015 at 10:16 PM, Eyal Naamati <eyal.naam...@exlibrisgroup.com> wrote: > Hi Ahmet, > Yes there is a space in my example. > This is my multiterm analyzer: > > > pattern="\-" replacement="\_" /> > > > > > Thanks! > > Eyal Naamati > Alma Developer > Tel: +972-2-6499313 > Mobile: +972-547915255 > eyal.naam...@exlibrisgroup.com > > www.exlibrisgroup.com > > -Original Message----- > From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] > Sent: Tuesday, December 29, 2015 5:18 PM > To: solr-user@lucene.apache.org > Subject: Re: multi term analyzer error > > Hi Eyal, > > What is your analyzer definition for multi-term? > In your example, is star charter separated from the term by a space? > > > Ahmet > > On Tuesday, December 29, 2015 3:26 PM, Eyal Naamati > <eyal.naam...@exlibrisgroup.com> wrote: > > > > > Hi, > > I defined a multi-term analyzer to my analysis chain, and it works as I > expect. However, for some queries (for example '* or 'term *') I get an > exception "analyzer returned no terms for multiTerm term". These queries work > when I don't customize a multi-term analyzer. > My question: is there a way to handle this in the analyzer configuration (in > my schema.xml)? I realize that I can also change the query I am sending the > analyzer, but that is difficult for me since there are many places in our > program that use this. > Thanks! > > Eyal Naamati > Alma Developer > Tel: +972-2-6499313 > Mobile: +972-547915255 > eyal.naam...@exlibrisgroup.com > > www.exlibrisgroup.com
Re: multi term analyzer error
Hi Eyal, What is your analyzer definition for multi-term? In your example, is star charter separated from the term by a space? Ahmet On Tuesday, December 29, 2015 3:26 PM, Eyal Naamatiwrote: Hi, I defined a multi-term analyzer to my analysis chain, and it works as I expect. However, for some queries (for example '* or 'term *') I get an exception "analyzer returned no terms for multiTerm term". These queries work when I don't customize a multi-term analyzer. My question: is there a way to handle this in the analyzer configuration (in my schema.xml)? I realize that I can also change the query I am sending the analyzer, but that is difficult for me since there are many places in our program that use this. Thanks! Eyal Naamati Alma Developer Tel: +972-2-6499313 Mobile: +972-547915255 eyal.naam...@exlibrisgroup.com www.exlibrisgroup.com
RE: multi term analyzer error
Hi Ahmet, Yes there is a space in my example. This is my multiterm analyzer: Thanks! Eyal Naamati Alma Developer Tel: +972-2-6499313 Mobile: +972-547915255 eyal.naam...@exlibrisgroup.com www.exlibrisgroup.com -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: Tuesday, December 29, 2015 5:18 PM To: solr-user@lucene.apache.org Subject: Re: multi term analyzer error Hi Eyal, What is your analyzer definition for multi-term? In your example, is star charter separated from the term by a space? Ahmet On Tuesday, December 29, 2015 3:26 PM, Eyal Naamati <eyal.naam...@exlibrisgroup.com> wrote: Hi, I defined a multi-term analyzer to my analysis chain, and it works as I expect. However, for some queries (for example '* or 'term *') I get an exception "analyzer returned no terms for multiTerm term". These queries work when I don't customize a multi-term analyzer. My question: is there a way to handle this in the analyzer configuration (in my schema.xml)? I realize that I can also change the query I am sending the analyzer, but that is difficult for me since there are many places in our program that use this. Thanks! Eyal Naamati Alma Developer Tel: +972-2-6499313 Mobile: +972-547915255 eyal.naam...@exlibrisgroup.com www.exlibrisgroup.com