On Jan 10, 2011, at 5:04 PM, lee carroll wrote:

> Hi Grant,
> 
> Its a search relevancy problem. For example:
> 
> a document about london reads like
> 
> London is not very good for a peaceful break.
> 
> we analyse this at the (i can't remember the technical term) is it lexical
> level? (bloody hell i think you may have wrote the book !) anyway which
> produces tokens in our index of say
> 
> "London good peaceful holiday"
> 
> users search for cities which would be nice for them to take a holiday in
> say the search is
> "good for a peaceful break"
> 
> and bang london is top. talk about a relevancy problem :-)

First question, why are you getting rid of "not"?  Despite it's reputation as a 
"stopword", it does carry a significant amount of meaning for you.  Then, you 
could probably do some phrase based searching that would help in some cases.

> 
> now i was thinking of using phrase matches in the synonyms file but is that
> the best approach or could nlp help here?

I suppose it could.  During indexing,  you could detect that it is a negative 
connotation and change it to be "bad for a peaceful break" or something like 
that.  I'm not aware of any system that does that.  You could also use some 
sentiment analysis to analyze the sentence and determine it is a negative 
sentence and then tag it as negative such that your query takes that into 
account.  Payloads and/or marker tokens would likely help here.

-Grant


> 
> cheers lee
> 
> 
> 
> 
> On 10 January 2011 18:21, Grant Ingersoll <gsing...@apache.org> wrote:
> 
>> 
>> On Jan 10, 2011, at 12:42 PM, lee carroll wrote:
>> 
>>> Hi
>>> 
>>> I'm indexing a set of documents which have a conversational writing
>> style.
>>> In particular the authors are very fond
>>> of listing facts in a variety of ways (this is to keep a human reader
>>> interested) but its causing my index trouble.
>>> 
>>> For example instead of listing facts like: the house is white, the castle
>> is
>>> pretty.
>>> 
>>> We get the house is the complete opposite of black and the castle is not
>>> ugly.
>>> 
>>> What are the best approaches to resolve these sorts of issues. Even if
>> its
>>> just handling "not" correctly would be a good start
>>> 
>> 
>> Hmm, good problem.  I guess I'd start by stepping back and ask what is the
>> problem you are trying to solve?  You've stated, I think, one half of the
>> problem, namely that your authors have a conversational style, but you
>> haven't stated what your users are expecting to do with this information?
>> Is this a pure search app?  Is it something else that is just backed by
>> Solr but the user would never do a search?
>> 
>> Do you have a relevance problem?  Also, what is your notion of handling
>> "not" correctly?  In other words, more details are welcome!
>> 
>> -Grant
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>> 
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to