Re: Searching w/explicit Multi-Word Synonym Expansion

Jack Krupansky Wed, 17 Jul 2013 09:28:56 -0700

By all means, feel free to write about how users can in fact do custom codefor Solr, but just keep a clear distinction between what could be developedand what is actually available off the shelf.

Yes, this list does have a mix of pure users and those who are willing tocustomize code as well. I didn't mean to discourage or denigrate the later,just to highlight that doing custom code is not the same as solutions beingavailable off the shelf.


-- Jack Krupansky

-----Original Message-----From: Roman Chyla

Sent: Wednesday, July 17, 2013 12:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Searching w/explicit Multi-Word Synonym Expansion

As I don't see in the heads of the users, I can make different assumptions
- but OK, seems reasonable that only minority of users here are actually
willing to do more (btw, I've received coding advice in the past here in
this list). I am working under the assumption that Lucene/SOLR devs are
swamped (there are always more requests and many unclosed JIRA issues), so
where else do they get helping hand than from users of this list? Users
like me, for example.

roman

On Wed, Jul 17, 2013 at 11:59 AM, Jack Krupansky<j...@basetechnology.com>wrote:

Remember, this is the "users" list, not the "dev" list. Users want to know
what they can do and use off the shelf today, not what "could" be
developed. Hopefully, the situation will be brighter in six months or a
year, but today... is today, not tomorrow.

(And, in fact, users can use LucidWorks Search for query-time phrase
synonyms, off-the-shelf, today, no patches required.)


-- Jack Krupansky

-----Original Message----- From: Roman Chyla
Sent: Wednesday, July 17, 2013 11:44 AM

To: solr-user@lucene.apache.org
Subject: Re: Searching w/explicit Multi-Word Synonym Expansion

OK, let's do a simple test instead of making claims - take your solr
instance, anything bigger or equal to version 4.0

In your schema.xml, pick a field and add the synonym filter

<filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
                   ignoreCase="true" expand="true"
tokenizerFactory="solr.**KeywordTokenizerFactory" />

in your synonyms.txt, add these entries:

hubble\0space\0telescope, HST

ATTENTION: the \0 is a null byte, you must be written as null byte! Youcan

do it with: python -c "print \"hubble\0space\0telescope,**HST\"" >
synonyms.txt

send a phrase query q=field:"hubble space telescope"&debugQuery=true

if you have done it right, you will see 'HST' is in the list - this means,

solr is able to recognize the multi-token synonym! As far as recognitionis

concerned, there is no need for more work on FST.

I have written a big unittest that proves the point (9 months ago,

LUCENE-4499) making no changes in the way how FST works. What is missingis

the query parser that can take advantage - another JIRA issue.

I'll repeat my claim now: the solution(s) are there, they solve theproblem

completely - they are not inside one JIRA issue, but they are there. They
need to be proven wrong, NOT proclaimed incomplete.


roman


On Wed, Jul 17, 2013 at 10:22 AM, Jack Krupansky <j...@basetechnology.com>
**wrote:

 To the best of my knowledge, there is no patch or collection of patches

which constitutes a "working solution" - just partial solutions.

Yes, it is true, there is some FST work underway (active??) that shows

promise depending on query parser implementation, but again, this is alla

longer-term future, not a "here and now". Maybe in the 5.0 timeframe?

I don't want anyone to get the impression that there are off-the-shelf

patches that completely solve the synonym phrase problem. Yes, progressis

being made, but we're not there yet.

-- Jack Krupansky

-----Original Message----- From: Roman Chyla
Sent: Wednesday, July 17, 2013 9:58 AM
To: solr-user@lucene.apache.org

Subject: Re: Searching w/explicit Multi-Word Synonym Expansion

Hi all,

What I find very 'sad' is that Lucene/SOLR contain all the necessary
components for handling multi-token synonyms; the Finite State Automaton
works perfectly for matching these items; the biggest problem is IMO the
old query parser which split things on spaces and doesn't know to be
smarter.

THIS IS A LONG-TIME PROBLEM - THERE EXIST SEVERAL WORKING SOLUTIONS (but
none was committed...sigh, we are re-inventing wheel all the time...)

LUCENE-1622
LUCENE-4381
LUCENE-4499


The problem of synonym expansion is more difficult becuase of the parsing
-
the default parsers are not flexible and they split on empty space -
recently I have proposed a solution which makes also the multi-token
synonym expansion simple

this is the ticket:
https://issues.apache.org/****jira/browse/LUCENE-5014<https://issues.apache.org/**jira/browse/LUCENE-5014>
<https:**//issues.apache.org/jira/**browse/LUCENE-5014<https://issues.apache.org/jira/browse/LUCENE-5014>
>


that query parser is able to split on spaces, then look back, do the
second
pass to see whether to expand with synonyms - and even discover different
parse paths and construct different queries based on that. if you want to
see some complex examples, look at:
https://github.com/romanchyla/****montysolr/blob/master/**contrib/**<https://github.com/romanchyla/**montysolr/blob/master/contrib/**>
adsabs/src/test/org/apache/****solr/analysis/**
TestAdsabsTypeFulltextParsing.****java<https://github.com/**
romanchyla/montysolr/blob/**master/contrib/adsabs/src/**
test/org/apache/solr/analysis/**TestAdsabsTypeFulltextParsing.**java<https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/apache/solr/analysis/TestAdsabsTypeFulltextParsing.java>
>

-
eg. line 373, 483


Lucene/SOLR developers are already doing great work and have much to do -
they need help from everybody who is able to apply patch, test it and
report back to JIRA.

roman



On Wed, Jul 17, 2013 at 9:37 AM, dmarini <david.marini...@gmail.com>
wrote:

 iorixxx,


Thanks for pointing me in the direction of the QueryElevation component.
If
it did not require that the target documents be keyed by the unique key
field it would be ideal, but since our Sku field is not the Unique field
(we
have an internal id which serves as the key while this is the client's
key)
it doesn't seem like it will match unless I make a larger scope change.

Jack,

I agree that out of the box there hasn't been a generalized solution for
this yet. I guess what I'm looking for is confirmation that I've gone as
far
as I can properly and from this point need to consider using something
like

the HON custom query parser component (which we're leery of usingbecause

from my reading it solves a specific scenario that may overcompensate
what

we're attempting to fix). I would personally rather stay IN solr thanadd

custom .jar files from around the web if at all possible.

Thanks for the replies.

--Dave





--
View this message in context:
http://lucene.472066.n3.**nabb**le.com/Searching-w-**<http://nabble.com/Searching-w-**>
explicit-Multi-Word-Synonym-****Expansion-tp4078469p4078610.****html<
http://lucene.472066.n3.**nabble.com/Searching-w-**
explicit-Multi-Word-Synonym-**Expansion-tp4078469p4078610.**html<http://lucene.472066.n3.nabble.com/Searching-w-explicit-Multi-Word-Synonym-Expansion-tp4078469p4078610.html>
>

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching w/explicit Multi-Word Synonym Expansion

Reply via email to