Re: Wildcards and fuzzy/phonetic query

2012-12-11 Thread Haagen Hasle

Thank you!  I actually tried to look through Jira, but I didn't focus on the 
minor issues.  For me, this is quite critical.. :-)

Any chance of merging this into the 4.0.1 release?


Regards, Haagen

Den 11. des. 2012 kl. 12:45 skrev Ahmet Arslan:

 Lowercasing actually seems to work with Wildcard queries,
 but not with fuzzy queries.  Are there any reasons why
 I should experience such a difference?
 
 Hi Haagen,
 
 Yonik added this recently. https://issues.apache.org/jira/browse/SOLR-4076
 



Re: Wildcards and fuzzy/phonetic query

2012-12-10 Thread Haagen Hasle

It's been two months since I asked about wildcards and phonetic filters, and 
finally the task of upgrading Solr to version 4.0 was prioritized in our 
project.  So the last couple of days I've been working on it.  Another team 
member upgraded Solr from 3.4 to 4.0, and I've been making changes to 
schema.xml to accommodate the new multiterm functionality.

However, it doesn't seem to work..  Lowercasing is still not done when I do a 
fuzzy search, not through the regular index analyzer and its support of 
MultitermAwareComponents, and not when I try to define a special multiterm 
analyzer.

Do I have to do anything special to enable the multiterm functionality in Solr 
4.0?


Regards, 

Hågen

Den 8. okt. 2012 kl. 18:09 skrev Erick Erickson:

 whether phonetic filters can be multiterm aware:
 
 I'd be leery of this, as I basically don't quite know how that would
 behave. You'd have to insure that the  algorithms changed the
 first parts of the words uniformly, regardless of what followed. I'm
 pretty sure that _some_ phonetic algorithms do not follow this
 pattern, i.e. eric wouldn't necessarily have the same beginning
 as erickson. That said, some of the algorithms _may_ follow this
 rule and might be OK candidates for being MultiTermAware
 
 But, you don't need this in order to try it out. See the Expert Level
 Schema Possibilities
 at:
 http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
 
 You can define your own analysis chain for wildcards as part of your 
 fieldType
 definition and include whatever you want, whether or not it's
 MultiTermAware and it
 will be applied at query time. Use the analyzer type=query entry
 as a basis. _But_ you shouldn't include anything in this section that
 produces more than one output per input token. Note, token, not
 field. I.e. a really bad candidate for this section is
 WordDelimiterFilterFactory
 if you use the admin/analysis page (which you'll get to know intimately) and
 look at a type that has WordDelimiterFilterFactory in its chain and
 put something
 like erickErickson1234, you'll see what I mean.. Make sure and check the
 verbose box
 
 If you can determine that some of the phonetic algorithms _should_ be
 MultiTermAware, please feel free to raise a JIRA and we can discuss... I 
 suspect
 it'll be on a case-by-case basis.
 
 Best
 Erick
 
 On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle
 haagenha...@gmail.com wrote:
 Hi!
 
 I'm quite new to Solr, I was recently asked to help out on a project where 
 the previous Solr-person quit quite suddenly.  I've noticed that some of 
 our searches don't return the expected result, and I'm hoping you guys can 
 help me out.
 
 We've indexed a lot of names, and would like to search for a person in our 
 system using these names.  We previously used Oracle Text for this, and we 
 experience that Solr is much faster.  So far so good! :)  But when we try to 
 use wildcards things start to to wrong.
 
 We're using Solr 3.4, and I see that some of our problems are solved in 3.6. 
  Ref SOLR-2438:
 https://issues.apache.org/jira/browse/SOLR-2438
 
 But we would also like to be able to combine wildcards with fuzzy searches, 
 and wildcards with a phonetic filter.  I don't see anything about phonetic 
 filters in SOLR-2438 or SOLR-2921.  
 (https://issues.apache.org/jira/browse/SOLR-2921)
 Is it possible to make the phonetic filters MultiTermAware?
 
 Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in 
 Solr..) and find both christian and kristian.  As far as I understand, this 
 is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.  
 Is this correct, or have I misunderstood anything?  Are there any 
 workarounds or filter-combinations I can use to achieve the same result?  
 I've seen people suggest using a boolean query to combine the two, but I 
 don't really see how that would solve my chr*-problem.
 
 As I mentioned earlier I'm quite new to this, so I apologize if what I'm 
 asking about only shows my ignorance..
 
 
 Regards, Hågen



Re: Wildcards and fuzzy/phonetic query

2012-12-10 Thread Haagen Hasle

Lowercasing actually seems to work with Wildcard queries, but not with fuzzy 
queries.  Are there any reasons why I should experience such a difference?


Regards, Haagen


Den 10. des. 2012 kl. 13:24 skrev Haagen Hasle:

 
 It's been two months since I asked about wildcards and phonetic filters, and 
 finally the task of upgrading Solr to version 4.0 was prioritized in our 
 project.  So the last couple of days I've been working on it.  Another team 
 member upgraded Solr from 3.4 to 4.0, and I've been making changes to 
 schema.xml to accommodate the new multiterm functionality.
 
 However, it doesn't seem to work..  Lowercasing is still not done when I do a 
 fuzzy search, not through the regular index analyzer and its support of 
 MultitermAwareComponents, and not when I try to define a special multiterm 
 analyzer.
 
 Do I have to do anything special to enable the multiterm functionality in 
 Solr 4.0?
 
 
 Regards, 
 
 Hågen
 
 Den 8. okt. 2012 kl. 18:09 skrev Erick Erickson:
 
 whether phonetic filters can be multiterm aware:
 
 I'd be leery of this, as I basically don't quite know how that would
 behave. You'd have to insure that the  algorithms changed the
 first parts of the words uniformly, regardless of what followed. I'm
 pretty sure that _some_ phonetic algorithms do not follow this
 pattern, i.e. eric wouldn't necessarily have the same beginning
 as erickson. That said, some of the algorithms _may_ follow this
 rule and might be OK candidates for being MultiTermAware
 
 But, you don't need this in order to try it out. See the Expert Level
 Schema Possibilities
 at:
 http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
 
 You can define your own analysis chain for wildcards as part of your 
 fieldType
 definition and include whatever you want, whether or not it's
 MultiTermAware and it
 will be applied at query time. Use the analyzer type=query entry
 as a basis. _But_ you shouldn't include anything in this section that
 produces more than one output per input token. Note, token, not
 field. I.e. a really bad candidate for this section is
 WordDelimiterFilterFactory
 if you use the admin/analysis page (which you'll get to know intimately) and
 look at a type that has WordDelimiterFilterFactory in its chain and
 put something
 like erickErickson1234, you'll see what I mean.. Make sure and check the
 verbose box
 
 If you can determine that some of the phonetic algorithms _should_ be
 MultiTermAware, please feel free to raise a JIRA and we can discuss... I 
 suspect
 it'll be on a case-by-case basis.
 
 Best
 Erick
 
 On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle
 haagenha...@gmail.com wrote:
 Hi!
 
 I'm quite new to Solr, I was recently asked to help out on a project where 
 the previous Solr-person quit quite suddenly.  I've noticed that some of 
 our searches don't return the expected result, and I'm hoping you guys can 
 help me out.
 
 We've indexed a lot of names, and would like to search for a person in our 
 system using these names.  We previously used Oracle Text for this, and we 
 experience that Solr is much faster.  So far so good! :)  But when we try 
 to use wildcards things start to to wrong.
 
 We're using Solr 3.4, and I see that some of our problems are solved in 
 3.6.  Ref SOLR-2438:
 https://issues.apache.org/jira/browse/SOLR-2438
 
 But we would also like to be able to combine wildcards with fuzzy searches, 
 and wildcards with a phonetic filter.  I don't see anything about phonetic 
 filters in SOLR-2438 or SOLR-2921.  
 (https://issues.apache.org/jira/browse/SOLR-2921)
 Is it possible to make the phonetic filters MultiTermAware?
 
 Regarding fuzzy queries, in Oracle Text I can search for chr% (chr* in 
 Solr..) and find both christian and kristian.  As far as I understand, this 
 is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.  
 Is this correct, or have I misunderstood anything?  Are there any 
 workarounds or filter-combinations I can use to achieve the same result?  
 I've seen people suggest using a boolean query to combine the two, but I 
 don't really see how that would solve my chr*-problem.
 
 As I mentioned earlier I'm quite new to this, so I apologize if what I'm 
 asking about only shows my ignorance..
 
 
 Regards, Hågen
 



Re: Wildcards and fuzzy/phonetic query

2012-10-09 Thread Haagen Hasle

I used the admin/analysis page (great tip, I had never used it before - thank 
you!) and it seems to me that the DoubleMetaphone filter converts Hågen to 
both JN and KN.  Will that crash the Solr analysis if I try to include this 
filter in the multiterm-analysis?

Do you know where I can find out more about combining wildcard and fuzzy in the 
same query?  When you say you don't think it is possible, do you mean it is not 
implemented in Solr today, or it can't be implemented because it is technically 
impossible or functionally doesn't make sense? :)  

I wrote in an answer to Otis that I'd like to try to combine fuzzy with Ngram 
as well.  Do you know if that is possible and makes any sense?


Thanks to everyone for quick and good answers, I really appreciate it!


Regards, Hågen

Den 8. okt. 2012 kl. 21:35 skrev Erick Erickson:

 To answer your first question, yes, you've got it right. If you define
 a multiterm section in your fieldType, whatever you put in that section
 gets applied whether the underlying class is MultiTermAware or not.
 Which means you can shoot yourself in the foot really bad G...
 
 (…)
 
 Fuzzy searches + wildcards. I don't think you can do that reasonably, but
 I'm not entirely sure.
 
 Best
 Erick