Re: Cannot get like exact searching to work

Aaron Zeckoski Thu, 11 Feb 2010 02:11:24 -0800

On Thu, Feb 11, 2010 at 8:39 AM, Ahmet Arslan <iori...@yahoo.com> wrote:
>> I am using SOLR 1.3 and my server is
>> embedded and accessed using SOLRJ.
>> I would like to setup my searches so that exact matches are
>> the first
>> results returned, followed by near matches, and finally
>> token based
>> matches.
>> For example, if I have a summary field in schema which is
>> created
>> using copyField from a bunch of other fields:
>> "My item title, keyword, other, stuff"
>>
>> I want this search to match the item above first and
>> foremost:
>> 1) "My item title*"
>>
>> Then this one:
>> 2) "my item*"
>
> Wildcards inside phrases are not supported by default. You can use SOLR-1604 
> for that in solr 1.4.0. But i am not sure it will work with 1.3. Can you try?


I might be able to try this out though in general the project has a
policy about only using released code (no trunk/unstable).
https://issues.apache.org/jira/browse/SOLR-1604
It looks like the kind of searching I want to do is not really
supported in SOLR by default though. Is that correct?


>> I tried creating a field to hold exact match data
>> (summaryExact) which
>> actually works if I paste in the precise text but stops
>> working as
>> soon as I add any wildcard to it.
>
> Your <fieldType name="exact"  definition is wrong. You can use directly 
> string field type which is not analyzed/tonenized. Where string definiton is:
>
> <fieldType name="string" class="solr.StrField" sortMissingLast="true" 
> omitNorms="true"/>

I thought that was what my exact definition was doing except I also
want the exact field to be lowercased and trimmed (which I don't want
for all strings). Can you explain what is wrong with the current
definition so I can fix it?


>> I could not quite figure out which tokenizer to use if I
>> don't want
>> any tokens created but just want to trim and lowercase the
>> string so
>> let me know if you have ideas on this.
>
> KeywordTokenizerFactory + TrimFilterFactory + LowercaseFilterFactory 
> combination can do that put punctuations won't be removed between tokens.
>
>> Basically, I want something
>> similar to DB "like" matching without case sensitivity and
>> probably
>> trimmed as well. I don't really want the field to be
>> tokenized though.
>
> Your examples seem you want to search something like startsWith? Can you 
> explain more in detail?

What I really want is the equivalent of a match like this along with
the normal tokenized matching (where the query has been lowercased and
trimmed as well):
select * from blah where lowercase(column) like '%query%';
I think this is called a phrase match or something like that. However,
wildcards cannot be used at the beginning of query so I guess I can
live with only being able to startsWith type matching until that is
fixed. For now I have tried to do that using this:
query = (summary:"my item" || summaryExact:"my item*"^3)
but I would do this if I could:
query = (summary:"my item" || summaryExact:"*my item*"^3)

The idea is that a "phrase" match would be boosted over the normal
token matches and would show up first in the listing. Let me know if
more examples would help. I am happy to provide them.


> Also your <fieldType name="name" class="solr.StrField" ..> declation is also 
> wrong. It should use class="solr.TextField".

OK, I will see if I can figure out how to correct that.

Thanks for all the help so far
-AZ


-- 
Aaron Zeckoski (azeckoski (at) vt.edu)
Senior Research Engineer - CARET - University of Cambridge
https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile

Re: Cannot get like exact searching to work

Reply via email to