Wow, thank you Trey, great information! We are a Fusion client, works well for 
us, we are leveraging the Signals Boosting. We were thinking omitNorms might be 
of help here, turning that off actually. The PERSON document ranks #1 always 
because it’s a tiny document with very short fields. I'll take a closer look at 
what you sent, Thank you!

Brett Moyer
Manager, Sr. Technical Lead | TFS Technology
  Public Production Support
  Digital Search & Discovery

8625 Andrew Carnegie Blvd | 4th floor
Charlotte, NC 28263
Tel: 704.988.4508
Fax: 704.988.4907
bmo...@tiaa.org 


-----Original Message-----
From: Trey Grainger [mailto:solrt...@gmail.com] 
Sent: Monday, April 01, 2019 1:15 PM
To: solr-user@lucene.apache.org
Subject: Re: IRA or IRA the Person

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.


Hi Brett,

There are a couple of angles you can take here. If you are only concerned
about this specific term or a small number of other known terms like "IRA"
and want to spot fix it, you can use something like the query elevation
component in Solr (
https://lucene.apache.org/solr/guide/7_7/the-query-elevation-component.html)
to explicitly include or exclude documents.

Otherwise, if you are looking for a more data-driven approach to solving
this, you can leverage the aggregate click-streams for your users across
all of the searches on your platform to boost documents higher that are
more popular for any given search. We do this in our commercial product
(Lucidworks Fusion) through our Signals Boosting feature, but you could
implement something similar yourself with some work, as the general
architecture is fairly well-documented here:
https://doc.lucidworks.com/fusion-ai/4.2/user-guide/signals/index.html

If you do not have long-lived content OR your do not have sufficient
signals history, you could alternatively use something like Solr's Semantic
Knowledge Graph to automatically find term vectors that are the most
related to your terms within your content. In that case, if the "individual
retirement account" meaning is more common across your documents, you'd
probably end up with terms more related to that which could be used to do
data-driven boosts on your query to that concept (instead of the person, in
this case).

I gave a presentation at Activate ("the Search & AI Conference") last year
on some of the more data-driven approaches to parsing and understanding the
meaning of terms within queries, that included things like disambiguation
(similar to what you're doing here) and some additional approaches
leveraging a combination of query log mining, the semantic knowledge graph,
and the Solr Text Tagger. If you start handling these use cases in a more
systematic and data-driven way, you might want to check out some of the
techniques I mentioned there: Video:
https://www.youtube.com/watch?v=4fMZnunTRF8 | Slides:
https://www.slideshare.net/treygrainger/how-to-build-a-semantic-search-system


All the best,

Trey Grainger
Chief Algorithms Officer @ Lucidworks


On Mon, Apr 1, 2019 at 11:45 AM Moyer, Brett <bmo...@tiaa.org> wrote:

> Hello,
>
>         Looking for ideas on how to determine intent and drive results to
> a person result or an article result. We are a financial institution and we
> have IRA's Individual Retirement Accounts and we have a page that talks
> about an Advisor, IRA Black.
>
>         Our users are in a bad habit of only using single terms for
> search. A very common search term is "ira". The PERSON page ranks higher
> than the article on IRA's. With essentially no information from the user,
> what are some way we can detect and rank differently? Thanks!
>
> Brett Moyer
> *************************************************************************
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender
> immediately and then delete it.
>
> TIAA
> *************************************************************************
>
*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*************************************************************************

Reply via email to