Re: Re: Re: Anyone have experience with Query Auto-Suggestor?
Hi all, reviving this thread. For those of you who use an external file for your suggestions, how do you decide from your query logs what suggestions to include? Just starting out with some exploratory analysis of clicks, dwell times, etc., and would love to hear from the community any advise. Thanks! Best, Audrey On 1/23/20, 2:26 PM, "Erik Hatcher" wrote: It's a great idea. And then index that file into a separate lean collection of just the suggestions, along with the weight as another field on those documents, to use for ranking them at query time with standard /select queries. (this separate suggest collection would also have appropriate tokenization to match the partial words as the user types, like ngramming) Erik > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > > David, > > Thank you, that is useful. So, would you recommend using a (clean) field over an external dictionary file? We have lots of "top queries" and measure their nDCG. A thought was to programmatically generate an external file where the weight per query term (or phrase) == its nDCG. Bad idea? > > Best, > Audrey > > On 1/20/20, 11:51 AM, "David Hastings" wrote: > >Ive used this quite a bit, my biggest piece of advice is to choose a field >that you know is clean, with well defined terms/words, you dont want an >autocomplete that has a massive dictionary, also it will make the >start/reload times pretty slow > >On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - >audrey.lorberf...@ibm.com wrote: > >> Hi All, >> >> We plan to incorporate a query autocomplete functionality into our search >> engine (like this: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= >> ). And I was wondering if anyone has personal experience with this >> component and would like to share? Basically, we are just looking for some >> best practices from more experienced Solr admins so that we have a starting >> place to launch this in our beta. >> >> Thank you! >> >> Best, >> Audrey >> > >
Re: Re: Anyone have experience with Query Auto-Suggestor?
Oh, great! Thank you, this is helpful! On 1/24/20, 6:43 PM, "Walter Underwood" wrote: Click-based weights are vulnerable to spamming. Some of us fondly remember when Google was showing Microsoft as the first hit for “evil empire” thanks to a click attack. For our ecommerce search, we use the actual titles of books weighted by order volume. Decorated titles are reduced to a base title, so “Managerial Accounting: Student Value Edition” becomes just “Managerial Accounting”. Showing all the variations is the job of the real results page. wunder Walter Underwood wun...@wunderwood.org https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=3oEhRJWEHDoz3HXt87Y_FXxPTUZg1zSA5r4P6urviug=87IOY_vKNONtR2r2IkW-NnZ4Rn3wI-OIO6RSdqdOMfU= (my blog) > On Jan 24, 2020, at 7:07 AM, Lucky Sharma wrote: > > Hi Audrey, > As suggested by Erik, you can index the data into a seperate collection and > You can instead of adding weights inthe document you can also use > LTR(Learning to Rank) with in Solr to rerank on the documents. > And also to increase more relevance with in the Autosuggestion and making > positional context of the user in case of Multi token keywords you can also > bigrams/trigrams to generate edge n-grams. > > > > Regards, > Lucky Sharma > > On Fri, 24 Jan, 2020, 8:28 pm Lucky Sharma, wrote: > >> Hi Audrey, >> As suggested by Erik, you can index the data into a seperate collection >> and You can instead of adding weights inthe document you can also use LTR >> with in Solr to rerank on the features. >> >> Regards, >> Lucky Sharma >> >> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld - >> audrey.lorberf...@ibm.com, wrote: >> >>> Erik, >>> >>> Thank you! Yes, that's exactly how we were thinking of architecting it. >>> And our ML engineer suggested something else for the suggestion weights, >>> actually -- to build a model that would programmatically update the weights >>> based on those suggestions' live clicks @ position k, etc. Pretty cool >>> idea... >>> >>> >>> >>> On 1/23/20, 2:26 PM, "Erik Hatcher" wrote: >>> >>>It's a great idea. And then index that file into a separate lean >>> collection of just the suggestions, along with the weight as another field >>> on those documents, to use for ranking them at query time with standard >>> /select queries. (this separate suggest collection would also have >>> appropriate tokenization to match the partial words as the user types, like >>> ngramming) >>> >>>Erik >>> >>> On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - >>> audrey.lorberf...@ibm.com wrote: David, Thank you, that is useful. So, would you recommend using a (clean) >>> field over an external dictionary file? We have lots of "top queries" and >>> measure their nDCG. A thought was to programmatically generate an external >>> file where the weight per query term (or phrase) == its nDCG. Bad idea? Best, Audrey On 1/20/20, 11:51 AM, "David Hastings" < >>> hastings.recurs...@gmail.com> wrote: Ive used this quite a bit, my biggest piece of advice is to >>> choose a field that you know is clean, with well defined terms/words, you dont >>> want an autocomplete that has a massive dictionary, also it will make the start/reload times pretty slow On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Hi All, > > We plan to incorporate a query autocomplete functionality into our >>> search > engine (like this: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= > ). And I was wondering if anyone has personal experience with this > component and would like to share? Basically, we are just looking >>> for some > best practices from more experienced Solr admins so that we have a >>> starting > place to launch this in our beta. > > Thank you! > > Best, > Audrey > >>> >>> >>> >>>
Re: Anyone have experience with Query Auto-Suggestor?
Click-based weights are vulnerable to spamming. Some of us fondly remember when Google was showing Microsoft as the first hit for “evil empire” thanks to a click attack. For our ecommerce search, we use the actual titles of books weighted by order volume. Decorated titles are reduced to a base title, so “Managerial Accounting: Student Value Edition” becomes just “Managerial Accounting”. Showing all the variations is the job of the real results page. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 24, 2020, at 7:07 AM, Lucky Sharma wrote: > > Hi Audrey, > As suggested by Erik, you can index the data into a seperate collection and > You can instead of adding weights inthe document you can also use > LTR(Learning to Rank) with in Solr to rerank on the documents. > And also to increase more relevance with in the Autosuggestion and making > positional context of the user in case of Multi token keywords you can also > bigrams/trigrams to generate edge n-grams. > > > > Regards, > Lucky Sharma > > On Fri, 24 Jan, 2020, 8:28 pm Lucky Sharma, wrote: > >> Hi Audrey, >> As suggested by Erik, you can index the data into a seperate collection >> and You can instead of adding weights inthe document you can also use LTR >> with in Solr to rerank on the features. >> >> Regards, >> Lucky Sharma >> >> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld - >> audrey.lorberf...@ibm.com, wrote: >> >>> Erik, >>> >>> Thank you! Yes, that's exactly how we were thinking of architecting it. >>> And our ML engineer suggested something else for the suggestion weights, >>> actually -- to build a model that would programmatically update the weights >>> based on those suggestions' live clicks @ position k, etc. Pretty cool >>> idea... >>> >>> >>> >>> On 1/23/20, 2:26 PM, "Erik Hatcher" wrote: >>> >>>It's a great idea. And then index that file into a separate lean >>> collection of just the suggestions, along with the weight as another field >>> on those documents, to use for ranking them at query time with standard >>> /select queries. (this separate suggest collection would also have >>> appropriate tokenization to match the partial words as the user types, like >>> ngramming) >>> >>>Erik >>> >>> On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - >>> audrey.lorberf...@ibm.com wrote: David, Thank you, that is useful. So, would you recommend using a (clean) >>> field over an external dictionary file? We have lots of "top queries" and >>> measure their nDCG. A thought was to programmatically generate an external >>> file where the weight per query term (or phrase) == its nDCG. Bad idea? Best, Audrey On 1/20/20, 11:51 AM, "David Hastings" < >>> hastings.recurs...@gmail.com> wrote: Ive used this quite a bit, my biggest piece of advice is to >>> choose a field that you know is clean, with well defined terms/words, you dont >>> want an autocomplete that has a massive dictionary, also it will make the start/reload times pretty slow On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Hi All, > > We plan to incorporate a query autocomplete functionality into our >>> search > engine (like this: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= > ). And I was wondering if anyone has personal experience with this > component and would like to share? Basically, we are just looking >>> for some > best practices from more experienced Solr admins so that we have a >>> starting > place to launch this in our beta. > > Thank you! > > Best, > Audrey > >>> >>> >>> >>>
Re: Re: Re: Re: Anyone have experience with Query Auto-Suggestor?
David, True! But we are hoping that these are purely seen as suggestions and that people, if they know exactly what they are wanting to type/looking for, will simply ignore the dropdown options. On 1/24/20, 10:03 AM, "David Hastings" wrote: This is a really cool idea! My only concern is that the edge case searches, where a user knows exactly what they want to find, would be autocomplete into something that happens to be more "successful" rather than what they were looking for. for example, i want to know the legal implications of jay z's 99 problems. most of the autocompletes i imagine would be for the lyrics for the song, or links to the video or jay z himself, when what im looking for is a line by line analysis of the song itself and how it relates to the fourth amendment: https://urldefense.proofpoint.com/v2/url?u=http-3A__pdf.textfiles.com_academics_lj56-2D2-5Fmason-5Farticle.pdf=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=CPAGySYcW7hCqtFtjaThX2vIAhcKEMHHhYpqtqHkx-Q=XEyh7ewstUTlEuyKcYHaTU1vHMYA2-Db_nIYnl89yw4= But in general this is a really clever idea, especially in the retail arena. However i suspect your use case is more in research, and after years of dealing with lawyers and librarians, they tend to not like having their searches intercepted, they know what they're looking for and they tend to get mad if you assume they dont :) On Fri, Jan 24, 2020 at 9:59 AM Lucky Sharma wrote: > Hi Audrey, > As suggested by Erik, you can index the data into a seperate collection and > You can instead of adding weights inthe document you can also use LTR with > in Solr to rerank on the features. > > Regards, > Lucky Sharma > > On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld - > audrey.lorberf...@ibm.com, > wrote: > > > Erik, > > > > Thank you! Yes, that's exactly how we were thinking of architecting it. > > And our ML engineer suggested something else for the suggestion weights, > > actually -- to build a model that would programmatically update the > weights > > based on those suggestions' live clicks @ position k, etc. Pretty cool > > idea... > > > > > > > > On 1/23/20, 2:26 PM, "Erik Hatcher" wrote: > > > > It's a great idea. And then index that file into a separate lean > > collection of just the suggestions, along with the weight as another > field > > on those documents, to use for ranking them at query time with standard > > /select queries. (this separate suggest collection would also have > > appropriate tokenization to match the partial words as the user types, > like > > ngramming) > > > > Erik > > > > > > > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - > > audrey.lorberf...@ibm.com wrote: > > > > > > David, > > > > > > Thank you, that is useful. So, would you recommend using a (clean) > > field over an external dictionary file? We have lots of "top queries" and > > measure their nDCG. A thought was to programmatically generate an > external > > file where the weight per query term (or phrase) == its nDCG. Bad idea? > > > > > > Best, > > > Audrey > > > > > > On 1/20/20, 11:51 AM, "David Hastings" < > hastings.recurs...@gmail.com> > > wrote: > > > > > >Ive used this quite a bit, my biggest piece of advice is to > > choose a field > > >that you know is clean, with well defined terms/words, you dont > > want an > > >autocomplete that has a massive dictionary, also it will make > the > > >start/reload times pretty slow > > > > > >On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - > > >audrey.lorberf...@ibm.com wrote: > > > > > >> Hi All, > > >> > > >> We plan to incorporate a query autocomplete functionality into our > > search > > >> engine (like this: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= > > >> ). And I was wondering if anyone has personal experience with this > > >> component and would like to share? Basically, we are just looking > > for some > > >> best practices from more experienced Solr admins so that we have a > > starting > > >> place to launch this in our beta. > > >> > > >> Thank you! > > >> > > >> Best, > > >> Audrey > > >> > > > > > > > > > > > > > > >
Re: Re: Re: Anyone have experience with Query Auto-Suggestor?
Hi Audrey, As suggested by Erik, you can index the data into a seperate collection and You can instead of adding weights inthe document you can also use LTR(Learning to Rank) with in Solr to rerank on the documents. And also to increase more relevance with in the Autosuggestion and making positional context of the user in case of Multi token keywords you can also bigrams/trigrams to generate edge n-grams. Regards, Lucky Sharma On Fri, 24 Jan, 2020, 8:28 pm Lucky Sharma, wrote: > Hi Audrey, > As suggested by Erik, you can index the data into a seperate collection > and You can instead of adding weights inthe document you can also use LTR > with in Solr to rerank on the features. > > Regards, > Lucky Sharma > > On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld - > audrey.lorberf...@ibm.com, wrote: > >> Erik, >> >> Thank you! Yes, that's exactly how we were thinking of architecting it. >> And our ML engineer suggested something else for the suggestion weights, >> actually -- to build a model that would programmatically update the weights >> based on those suggestions' live clicks @ position k, etc. Pretty cool >> idea... >> >> >> >> On 1/23/20, 2:26 PM, "Erik Hatcher" wrote: >> >> It's a great idea. And then index that file into a separate lean >> collection of just the suggestions, along with the weight as another field >> on those documents, to use for ranking them at query time with standard >> /select queries. (this separate suggest collection would also have >> appropriate tokenization to match the partial words as the user types, like >> ngramming) >> >> Erik >> >> >> > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - >> audrey.lorberf...@ibm.com wrote: >> > >> > David, >> > >> > Thank you, that is useful. So, would you recommend using a (clean) >> field over an external dictionary file? We have lots of "top queries" and >> measure their nDCG. A thought was to programmatically generate an external >> file where the weight per query term (or phrase) == its nDCG. Bad idea? >> > >> > Best, >> > Audrey >> > >> > On 1/20/20, 11:51 AM, "David Hastings" < >> hastings.recurs...@gmail.com> wrote: >> > >> >Ive used this quite a bit, my biggest piece of advice is to >> choose a field >> >that you know is clean, with well defined terms/words, you dont >> want an >> >autocomplete that has a massive dictionary, also it will make the >> >start/reload times pretty slow >> > >> >On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - >> >audrey.lorberf...@ibm.com wrote: >> > >> >> Hi All, >> >> >> >> We plan to incorporate a query autocomplete functionality into our >> search >> >> engine (like this: >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= >> >> ). And I was wondering if anyone has personal experience with this >> >> component and would like to share? Basically, we are just looking >> for some >> >> best practices from more experienced Solr admins so that we have a >> starting >> >> place to launch this in our beta. >> >> >> >> Thank you! >> >> >> >> Best, >> >> Audrey >> >> >> > >> > >> >> >> >>
Re: Re: Re: Anyone have experience with Query Auto-Suggestor?
This is a really cool idea! My only concern is that the edge case searches, where a user knows exactly what they want to find, would be autocomplete into something that happens to be more "successful" rather than what they were looking for. for example, i want to know the legal implications of jay z's 99 problems. most of the autocompletes i imagine would be for the lyrics for the song, or links to the video or jay z himself, when what im looking for is a line by line analysis of the song itself and how it relates to the fourth amendment: http://pdf.textfiles.com/academics/lj56-2_mason_article.pdf But in general this is a really clever idea, especially in the retail arena. However i suspect your use case is more in research, and after years of dealing with lawyers and librarians, they tend to not like having their searches intercepted, they know what they're looking for and they tend to get mad if you assume they dont :) On Fri, Jan 24, 2020 at 9:59 AM Lucky Sharma wrote: > Hi Audrey, > As suggested by Erik, you can index the data into a seperate collection and > You can instead of adding weights inthe document you can also use LTR with > in Solr to rerank on the features. > > Regards, > Lucky Sharma > > On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld - > audrey.lorberf...@ibm.com, > wrote: > > > Erik, > > > > Thank you! Yes, that's exactly how we were thinking of architecting it. > > And our ML engineer suggested something else for the suggestion weights, > > actually -- to build a model that would programmatically update the > weights > > based on those suggestions' live clicks @ position k, etc. Pretty cool > > idea... > > > > > > > > On 1/23/20, 2:26 PM, "Erik Hatcher" wrote: > > > > It's a great idea. And then index that file into a separate lean > > collection of just the suggestions, along with the weight as another > field > > on those documents, to use for ranking them at query time with standard > > /select queries. (this separate suggest collection would also have > > appropriate tokenization to match the partial words as the user types, > like > > ngramming) > > > > Erik > > > > > > > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - > > audrey.lorberf...@ibm.com wrote: > > > > > > David, > > > > > > Thank you, that is useful. So, would you recommend using a (clean) > > field over an external dictionary file? We have lots of "top queries" and > > measure their nDCG. A thought was to programmatically generate an > external > > file where the weight per query term (or phrase) == its nDCG. Bad idea? > > > > > > Best, > > > Audrey > > > > > > On 1/20/20, 11:51 AM, "David Hastings" < > hastings.recurs...@gmail.com> > > wrote: > > > > > >Ive used this quite a bit, my biggest piece of advice is to > > choose a field > > >that you know is clean, with well defined terms/words, you dont > > want an > > >autocomplete that has a massive dictionary, also it will make > the > > >start/reload times pretty slow > > > > > >On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - > > >audrey.lorberf...@ibm.com wrote: > > > > > >> Hi All, > > >> > > >> We plan to incorporate a query autocomplete functionality into our > > search > > >> engine (like this: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= > > >> ). And I was wondering if anyone has personal experience with this > > >> component and would like to share? Basically, we are just looking > > for some > > >> best practices from more experienced Solr admins so that we have a > > starting > > >> place to launch this in our beta. > > >> > > >> Thank you! > > >> > > >> Best, > > >> Audrey > > >> > > > > > > > > > > > > > > >
Re: Re: Re: Anyone have experience with Query Auto-Suggestor?
Hi Audrey, As suggested by Erik, you can index the data into a seperate collection and You can instead of adding weights inthe document you can also use LTR with in Solr to rerank on the features. Regards, Lucky Sharma On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld - audrey.lorberf...@ibm.com, wrote: > Erik, > > Thank you! Yes, that's exactly how we were thinking of architecting it. > And our ML engineer suggested something else for the suggestion weights, > actually -- to build a model that would programmatically update the weights > based on those suggestions' live clicks @ position k, etc. Pretty cool > idea... > > > > On 1/23/20, 2:26 PM, "Erik Hatcher" wrote: > > It's a great idea. And then index that file into a separate lean > collection of just the suggestions, along with the weight as another field > on those documents, to use for ranking them at query time with standard > /select queries. (this separate suggest collection would also have > appropriate tokenization to match the partial words as the user types, like > ngramming) > > Erik > > > > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - > audrey.lorberf...@ibm.com wrote: > > > > David, > > > > Thank you, that is useful. So, would you recommend using a (clean) > field over an external dictionary file? We have lots of "top queries" and > measure their nDCG. A thought was to programmatically generate an external > file where the weight per query term (or phrase) == its nDCG. Bad idea? > > > > Best, > > Audrey > > > > On 1/20/20, 11:51 AM, "David Hastings" > wrote: > > > >Ive used this quite a bit, my biggest piece of advice is to > choose a field > >that you know is clean, with well defined terms/words, you dont > want an > >autocomplete that has a massive dictionary, also it will make the > >start/reload times pretty slow > > > >On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - > >audrey.lorberf...@ibm.com wrote: > > > >> Hi All, > >> > >> We plan to incorporate a query autocomplete functionality into our > search > >> engine (like this: > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= > >> ). And I was wondering if anyone has personal experience with this > >> component and would like to share? Basically, we are just looking > for some > >> best practices from more experienced Solr admins so that we have a > starting > >> place to launch this in our beta. > >> > >> Thank you! > >> > >> Best, > >> Audrey > >> > > > > > > > >
Re: Re: Re: Anyone have experience with Query Auto-Suggestor?
Hi Alessandro, I'm so happy there is someone who's done extensive work with QAC here! Right now, we measure nDCG via a Dynamic Bayesian Network. To break it down, we: - use a DBN model to generate a "score" for each query_url pair. - We then plug that score into a mathematical formula we found in a research paper (happy to share the paper if you're interested) for assigning labels 0-4. - We then cross-reference the scored & labeled query_url pairs with 1k of our system's top queries and 1k of our system's random queries. - We use that dataset as our ground truth. - We then query the system in real time each day for those 2k queries, label them, and compare those labels with our ground truth to get our system's nDCG. I hope that makes sense! Lots of steps __ Due to computational overhead reasons, we are pretty committed to using an external file & a separate Solr core for our suggestions. We are also planning to use the Suggester to add a little human nudge towards "successful" queries. I'm not sure whether that's what the Suggester is really meant to do, but we are not using it as a naïve prefix-matcher, but more of a query-suggestion tool. So, if we know that the query "blue pages" is less successful than the query "bluepages" (assuming we can identify the user's intent with this query), we will not show suggestions that match "blue pages," instead we will show suggestions that match "bluepages." Sort of like a query rewrite, except with fuzzy prefix matching, not the introduction of synonyms/expansions. What we are concerned with currently is how to define a "successful" query. We have things like abandonment rate, dwell time, etc., but if you have any advice on more ways to identify successful queries, that'd be great. We want to stay away from defining success as "popularity," since that will just create a closed language system where people only query popular queries, and those queries stay popular only because people are querying them (assuming people click on the suggestions, of course). Let me know your thoughts! On 1/23/20, 10:45 AM, "Alessandro Benedetti" wrote: I have been working extensively on query autocompletion, these blogs should be helpful to you: https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2015_07_solr-2Dyou-2Dcomplete-2Dme.html=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI=c149I_QBokd35FBMGaUxoBPMViUXAdZtVnkSKTINndE= https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2018_06_apache-2Dlucene-2Dblendedinfixsuggester-2Dhow-2Dit-2Dworks-2Dbugs-2Dand-2Dimprovements.html=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI=m8s2XvI7tR1t9bNaA4SI-w90MdbLZTYxc0mBMz8RMSw= You idea of using search quality evaluation to drive the autocompletion is interesting. How do you currently calculate the NDCG for a query? What's your golden truth? Using that approach you will autocomplete favouring query completion that your search engine is able to process better, not necessarily closer to the user intent, still it could work. We should differentiate here between the suggester dictionary (where the suggestions come from, in your case it could be your extracted data) and the kind of suggestion (that in your case could be the free text suggester lookup) Cheers -- Alessandro Benedetti Search Consultant, R Software Engineer, Director www.sease.io On Mon, 20 Jan 2020 at 17:02, David Hastings wrote: > Not a bad idea at all, however ive never used an external file before, just > a field in the index, so not an area im familiar with > > On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld - > audrey.lorberf...@ibm.com wrote: > > > David, > > > > Thank you, that is useful. So, would you recommend using a (clean) field > > over an external dictionary file? We have lots of "top queries" and > measure > > their nDCG. A thought was to programmatically generate an external file > > where the weight per query term (or phrase) == its nDCG. Bad idea? > > > > Best, > > Audrey > > > > On 1/20/20, 11:51 AM, "David Hastings" > > wrote: > > > > Ive used this quite a bit, my biggest piece of advice is to choose a > > field > > that you know is clean, with well defined terms/words, you dont want > an > > autocomplete that has a massive dictionary, also it will make the > > start/reload times pretty slow > > > > On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - > > audrey.lorberf...@ibm.com wrote: > > > > > Hi All, > > > > > > We plan to incorporate a query autocomplete functionality into our > > search >
Re: Re: Re: Anyone have experience with Query Auto-Suggestor?
Erik, Thank you! Yes, that's exactly how we were thinking of architecting it. And our ML engineer suggested something else for the suggestion weights, actually -- to build a model that would programmatically update the weights based on those suggestions' live clicks @ position k, etc. Pretty cool idea... On 1/23/20, 2:26 PM, "Erik Hatcher" wrote: It's a great idea. And then index that file into a separate lean collection of just the suggestions, along with the weight as another field on those documents, to use for ranking them at query time with standard /select queries. (this separate suggest collection would also have appropriate tokenization to match the partial words as the user types, like ngramming) Erik > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > > David, > > Thank you, that is useful. So, would you recommend using a (clean) field over an external dictionary file? We have lots of "top queries" and measure their nDCG. A thought was to programmatically generate an external file where the weight per query term (or phrase) == its nDCG. Bad idea? > > Best, > Audrey > > On 1/20/20, 11:51 AM, "David Hastings" wrote: > >Ive used this quite a bit, my biggest piece of advice is to choose a field >that you know is clean, with well defined terms/words, you dont want an >autocomplete that has a massive dictionary, also it will make the >start/reload times pretty slow > >On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - >audrey.lorberf...@ibm.com wrote: > >> Hi All, >> >> We plan to incorporate a query autocomplete functionality into our search >> engine (like this: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= >> ). And I was wondering if anyone has personal experience with this >> component and would like to share? Basically, we are just looking for some >> best practices from more experienced Solr admins so that we have a starting >> place to launch this in our beta. >> >> Thank you! >> >> Best, >> Audrey >> > >
Re: Re: Anyone have experience with Query Auto-Suggestor?
It's a great idea. And then index that file into a separate lean collection of just the suggestions, along with the weight as another field on those documents, to use for ranking them at query time with standard /select queries. (this separate suggest collection would also have appropriate tokenization to match the partial words as the user types, like ngramming) Erik > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - audrey.lorberf...@ibm.com > wrote: > > David, > > Thank you, that is useful. So, would you recommend using a (clean) field over > an external dictionary file? We have lots of "top queries" and measure their > nDCG. A thought was to programmatically generate an external file where the > weight per query term (or phrase) == its nDCG. Bad idea? > > Best, > Audrey > > On 1/20/20, 11:51 AM, "David Hastings" wrote: > >Ive used this quite a bit, my biggest piece of advice is to choose a field >that you know is clean, with well defined terms/words, you dont want an >autocomplete that has a massive dictionary, also it will make the >start/reload times pretty slow > >On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - >audrey.lorberf...@ibm.com wrote: > >> Hi All, >> >> We plan to incorporate a query autocomplete functionality into our search >> engine (like this: >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= >> >> ). And I was wondering if anyone has personal experience with this >> component and would like to share? Basically, we are just looking for some >> best practices from more experienced Solr admins so that we have a starting >> place to launch this in our beta. >> >> Thank you! >> >> Best, >> Audrey >> > >
Re: Re: Anyone have experience with Query Auto-Suggestor?
I have been working extensively on query autocompletion, these blogs should be helpful to you: https://sease.io/2015/07/solr-you-complete-me.html https://sease.io/2018/06/apache-lucene-blendedinfixsuggester-how-it-works-bugs-and-improvements.html You idea of using search quality evaluation to drive the autocompletion is interesting. How do you currently calculate the NDCG for a query? What's your golden truth? Using that approach you will autocomplete favouring query completion that your search engine is able to process better, not necessarily closer to the user intent, still it could work. We should differentiate here between the suggester dictionary (where the suggestions come from, in your case it could be your extracted data) and the kind of suggestion (that in your case could be the free text suggester lookup) Cheers -- Alessandro Benedetti Search Consultant, R Software Engineer, Director www.sease.io On Mon, 20 Jan 2020 at 17:02, David Hastings wrote: > Not a bad idea at all, however ive never used an external file before, just > a field in the index, so not an area im familiar with > > On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld - > audrey.lorberf...@ibm.com wrote: > > > David, > > > > Thank you, that is useful. So, would you recommend using a (clean) field > > over an external dictionary file? We have lots of "top queries" and > measure > > their nDCG. A thought was to programmatically generate an external file > > where the weight per query term (or phrase) == its nDCG. Bad idea? > > > > Best, > > Audrey > > > > On 1/20/20, 11:51 AM, "David Hastings" > > wrote: > > > > Ive used this quite a bit, my biggest piece of advice is to choose a > > field > > that you know is clean, with well defined terms/words, you dont want > an > > autocomplete that has a massive dictionary, also it will make the > > start/reload times pretty slow > > > > On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - > > audrey.lorberf...@ibm.com wrote: > > > > > Hi All, > > > > > > We plan to incorporate a query autocomplete functionality into our > > search > > > engine (like this: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= > > > ). And I was wondering if anyone has personal experience with this > > > component and would like to share? Basically, we are just looking > > for some > > > best practices from more experienced Solr admins so that we have a > > starting > > > place to launch this in our beta. > > > > > > Thank you! > > > > > > Best, > > > Audrey > > > > > > > > > >
Re: Re: Anyone have experience with Query Auto-Suggestor?
Not a bad idea at all, however ive never used an external file before, just a field in the index, so not an area im familiar with On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > David, > > Thank you, that is useful. So, would you recommend using a (clean) field > over an external dictionary file? We have lots of "top queries" and measure > their nDCG. A thought was to programmatically generate an external file > where the weight per query term (or phrase) == its nDCG. Bad idea? > > Best, > Audrey > > On 1/20/20, 11:51 AM, "David Hastings" > wrote: > > Ive used this quite a bit, my biggest piece of advice is to choose a > field > that you know is clean, with well defined terms/words, you dont want an > autocomplete that has a massive dictionary, also it will make the > start/reload times pretty slow > > On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - > audrey.lorberf...@ibm.com wrote: > > > Hi All, > > > > We plan to incorporate a query autocomplete functionality into our > search > > engine (like this: > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= > > ). And I was wondering if anyone has personal experience with this > > component and would like to share? Basically, we are just looking > for some > > best practices from more experienced Solr admins so that we have a > starting > > place to launch this in our beta. > > > > Thank you! > > > > Best, > > Audrey > > > > >
Re: Re: Anyone have experience with Query Auto-Suggestor?
David, Thank you, that is useful. So, would you recommend using a (clean) field over an external dictionary file? We have lots of "top queries" and measure their nDCG. A thought was to programmatically generate an external file where the weight per query term (or phrase) == its nDCG. Bad idea? Best, Audrey On 1/20/20, 11:51 AM, "David Hastings" wrote: Ive used this quite a bit, my biggest piece of advice is to choose a field that you know is clean, with well defined terms/words, you dont want an autocomplete that has a massive dictionary, also it will make the start/reload times pretty slow On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Hi All, > > We plan to incorporate a query autocomplete functionality into our search > engine (like this: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0= > ). And I was wondering if anyone has personal experience with this > component and would like to share? Basically, we are just looking for some > best practices from more experienced Solr admins so that we have a starting > place to launch this in our beta. > > Thank you! > > Best, > Audrey >
Re: Anyone have experience with Query Auto-Suggestor?
Ive used this quite a bit, my biggest piece of advice is to choose a field that you know is clean, with well defined terms/words, you dont want an autocomplete that has a massive dictionary, also it will make the start/reload times pretty slow On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Hi All, > > We plan to incorporate a query autocomplete functionality into our search > engine (like this: https://lucene.apache.org/solr/guide/8_1/suggester.html > ). And I was wondering if anyone has personal experience with this > component and would like to share? Basically, we are just looking for some > best practices from more experienced Solr admins so that we have a starting > place to launch this in our beta. > > Thank you! > > Best, > Audrey >
Anyone have experience with Query Auto-Suggestor?
Hi All, We plan to incorporate a query autocomplete functionality into our search engine (like this: https://lucene.apache.org/solr/guide/8_1/suggester.html ). And I was wondering if anyone has personal experience with this component and would like to share? Basically, we are just looking for some best practices from more experienced Solr admins so that we have a starting place to launch this in our beta. Thank you! Best, Audrey