Wow, thank you Trey, great information! We are a Fusion client, works well for us, we are leveraging the Signals Boosting. We were thinking omitNorms might be of help here, turning that off actually. The PERSON document ranks #1 always because it’s a tiny document with very short fields. I'll take a closer look at what you sent, Thank you!
Brett Moyer Manager, Sr. Technical Lead | TFS Technology Public Production Support Digital Search & Discovery 8625 Andrew Carnegie Blvd | 4th floor Charlotte, NC 28263 Tel: 704.988.4508 Fax: 704.988.4907 bmo...@tiaa.org -----Original Message----- From: Trey Grainger [mailto:solrt...@gmail.com] Sent: Monday, April 01, 2019 1:15 PM To: solr-user@lucene.apache.org Subject: Re: IRA or IRA the Person CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Brett, There are a couple of angles you can take here. If you are only concerned about this specific term or a small number of other known terms like "IRA" and want to spot fix it, you can use something like the query elevation component in Solr ( https://lucene.apache.org/solr/guide/7_7/the-query-elevation-component.html) to explicitly include or exclude documents. Otherwise, if you are looking for a more data-driven approach to solving this, you can leverage the aggregate click-streams for your users across all of the searches on your platform to boost documents higher that are more popular for any given search. We do this in our commercial product (Lucidworks Fusion) through our Signals Boosting feature, but you could implement something similar yourself with some work, as the general architecture is fairly well-documented here: https://doc.lucidworks.com/fusion-ai/4.2/user-guide/signals/index.html If you do not have long-lived content OR your do not have sufficient signals history, you could alternatively use something like Solr's Semantic Knowledge Graph to automatically find term vectors that are the most related to your terms within your content. In that case, if the "individual retirement account" meaning is more common across your documents, you'd probably end up with terms more related to that which could be used to do data-driven boosts on your query to that concept (instead of the person, in this case). I gave a presentation at Activate ("the Search & AI Conference") last year on some of the more data-driven approaches to parsing and understanding the meaning of terms within queries, that included things like disambiguation (similar to what you're doing here) and some additional approaches leveraging a combination of query log mining, the semantic knowledge graph, and the Solr Text Tagger. If you start handling these use cases in a more systematic and data-driven way, you might want to check out some of the techniques I mentioned there: Video: https://www.youtube.com/watch?v=4fMZnunTRF8 | Slides: https://www.slideshare.net/treygrainger/how-to-build-a-semantic-search-system All the best, Trey Grainger Chief Algorithms Officer @ Lucidworks On Mon, Apr 1, 2019 at 11:45 AM Moyer, Brett <bmo...@tiaa.org> wrote: > Hello, > > Looking for ideas on how to determine intent and drive results to > a person result or an article result. We are a financial institution and we > have IRA's Individual Retirement Accounts and we have a page that talks > about an Advisor, IRA Black. > > Our users are in a bad habit of only using single terms for > search. A very common search term is "ira". The PERSON page ranks higher > than the article on IRA's. With essentially no information from the user, > what are some way we can detect and rank differently? Thanks! > > Brett Moyer > ************************************************************************* > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender > immediately and then delete it. > > TIAA > ************************************************************************* > ************************************************************************* This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA *************************************************************************