Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-31 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi all, reviving this thread.

For those of you who use an external file for your suggestions, how do you 
decide from your query logs what suggestions to include? Just starting out with 
some exploratory analysis of clicks, dwell times, etc., and would love to hear 
from the community any advise.

Thanks!

Best,
Audrey

On 1/23/20, 2:26 PM, "Erik Hatcher"  wrote:

It's a great idea.   And then index that file into a separate lean 
collection of just the suggestions, along with the weight as another field on 
those documents, to use for ranking them at query time with standard /select 
queries.  (this separate suggest collection would also have appropriate 
tokenization to match the partial words as the user types, like ngramming)

Erik


> On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - 
audrey.lorberf...@ibm.com  wrote:
> 
> David, 
> 
> Thank you, that is useful. So, would you recommend using a (clean) field 
over an external dictionary file? We have lots of "top queries" and measure 
their nDCG. A thought was to programmatically generate an external file where 
the weight per query term (or phrase) == its nDCG. Bad idea?
> 
> Best,
> Audrey
> 
> On 1/20/20, 11:51 AM, "David Hastings"  
wrote:
> 
>Ive used this quite a bit, my biggest piece of advice is to choose a 
field
>that you know is clean, with well defined terms/words, you dont want an
>autocomplete that has a massive dictionary, also it will make the
>start/reload times pretty slow
> 
>On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
>audrey.lorberf...@ibm.com  wrote:
> 
>> Hi All,
>> 
>> We plan to incorporate a query autocomplete functionality into our search
>> engine (like this: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
 
>> ). And I was wondering if anyone has personal experience with this
>> component and would like to share? Basically, we are just looking for 
some
>> best practices from more experienced Solr admins so that we have a 
starting
>> place to launch this in our beta.
>> 
>> Thank you!
>> 
>> Best,
>> Audrey
>> 
> 
> 





Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-26 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Oh, great! Thank you, this is helpful!

On 1/24/20, 6:43 PM, "Walter Underwood"  wrote:

Click-based weights are vulnerable to spamming. Some of us fondly remember 
when
Google was showing Microsoft as the first hit for “evil empire” thanks to a 
click attack.

For our ecommerce search, we use the actual titles of books weighted by 
order volume.
Decorated titles are reduced to a base title, so “Managerial Accounting: 
Student Value Edition”
becomes just “Managerial Accounting”. Showing all the variations is the job 
of the 
real results page.

wunder
Walter Underwood
wun...@wunderwood.org

https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=3oEhRJWEHDoz3HXt87Y_FXxPTUZg1zSA5r4P6urviug=87IOY_vKNONtR2r2IkW-NnZ4Rn3wI-OIO6RSdqdOMfU=
   (my blog)

> On Jan 24, 2020, at 7:07 AM, Lucky Sharma  wrote:
> 
> Hi Audrey,
> As suggested by Erik, you can index the data into a seperate collection 
and
> You can instead of adding weights inthe document you can also use
> LTR(Learning to Rank) with in Solr to rerank on the documents.
> And also to increase more relevance with in the Autosuggestion and making
> positional context of the user in case of Multi token keywords you can 
also
> bigrams/trigrams to generate edge n-grams.
> 
> 
> 
> Regards,
> Lucky Sharma
> 
> On Fri, 24 Jan, 2020, 8:28 pm Lucky Sharma,  wrote:
> 
>> Hi Audrey,
>> As suggested by Erik, you can index the data into a seperate collection
>> and You can instead of adding weights inthe document you can also use LTR
>> with in Solr to rerank on the features.
>> 
>> Regards,
>> Lucky Sharma
>> 
>> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
>> audrey.lorberf...@ibm.com,  wrote:
>> 
>>> Erik,
>>> 
>>> Thank you! Yes, that's exactly how we were thinking of architecting it.
>>> And our ML engineer suggested something else for the suggestion weights,
>>> actually -- to build a model that would programmatically update the 
weights
>>> based on those suggestions' live clicks @ position k, etc. Pretty cool
>>> idea...
>>> 
>>> 
>>> 
>>> On 1/23/20, 2:26 PM, "Erik Hatcher"  wrote:
>>> 
>>>It's a great idea.   And then index that file into a separate lean
>>> collection of just the suggestions, along with the weight as another 
field
>>> on those documents, to use for ranking them at query time with standard
>>> /select queries.  (this separate suggest collection would also have
>>> appropriate tokenization to match the partial words as the user types, 
like
>>> ngramming)
>>> 
>>>Erik
>>> 
>>> 
 On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
>>> audrey.lorberf...@ibm.com  wrote:
 
 David,
 
 Thank you, that is useful. So, would you recommend using a (clean)
>>> field over an external dictionary file? We have lots of "top queries" 
and
>>> measure their nDCG. A thought was to programmatically generate an 
external
>>> file where the weight per query term (or phrase) == its nDCG. Bad idea?
 
 Best,
 Audrey
 
 On 1/20/20, 11:51 AM, "David Hastings" <
>>> hastings.recurs...@gmail.com> wrote:
 
   Ive used this quite a bit, my biggest piece of advice is to
>>> choose a field
   that you know is clean, with well defined terms/words, you dont
>>> want an
   autocomplete that has a massive dictionary, also it will make the
   start/reload times pretty slow
 
   On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
   audrey.lorberf...@ibm.com  wrote:
 
> Hi All,
> 
> We plan to incorporate a query autocomplete functionality into our
>>> search
> engine (like this:
>>> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
> ). And I was wondering if anyone has personal experience with this
> component and would like to share? Basically, we are just looking
>>> for some
> best practices from more experienced Solr admins so that we have a
>>> starting
> place to launch this in our beta.
> 
> Thank you!
> 
> Best,
> Audrey
> 
 
 
>>> 
>>> 
>>> 
>>> 





Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Walter Underwood
Click-based weights are vulnerable to spamming. Some of us fondly remember when
Google was showing Microsoft as the first hit for “evil empire” thanks to a 
click attack.

For our ecommerce search, we use the actual titles of books weighted by order 
volume.
Decorated titles are reduced to a base title, so “Managerial Accounting: 
Student Value Edition”
becomes just “Managerial Accounting”. Showing all the variations is the job of 
the 
real results page.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 24, 2020, at 7:07 AM, Lucky Sharma  wrote:
> 
> Hi Audrey,
> As suggested by Erik, you can index the data into a seperate collection and
> You can instead of adding weights inthe document you can also use
> LTR(Learning to Rank) with in Solr to rerank on the documents.
> And also to increase more relevance with in the Autosuggestion and making
> positional context of the user in case of Multi token keywords you can also
> bigrams/trigrams to generate edge n-grams.
> 
> 
> 
> Regards,
> Lucky Sharma
> 
> On Fri, 24 Jan, 2020, 8:28 pm Lucky Sharma,  wrote:
> 
>> Hi Audrey,
>> As suggested by Erik, you can index the data into a seperate collection
>> and You can instead of adding weights inthe document you can also use LTR
>> with in Solr to rerank on the features.
>> 
>> Regards,
>> Lucky Sharma
>> 
>> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
>> audrey.lorberf...@ibm.com,  wrote:
>> 
>>> Erik,
>>> 
>>> Thank you! Yes, that's exactly how we were thinking of architecting it.
>>> And our ML engineer suggested something else for the suggestion weights,
>>> actually -- to build a model that would programmatically update the weights
>>> based on those suggestions' live clicks @ position k, etc. Pretty cool
>>> idea...
>>> 
>>> 
>>> 
>>> On 1/23/20, 2:26 PM, "Erik Hatcher"  wrote:
>>> 
>>>It's a great idea.   And then index that file into a separate lean
>>> collection of just the suggestions, along with the weight as another field
>>> on those documents, to use for ranking them at query time with standard
>>> /select queries.  (this separate suggest collection would also have
>>> appropriate tokenization to match the partial words as the user types, like
>>> ngramming)
>>> 
>>>Erik
>>> 
>>> 
 On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
>>> audrey.lorberf...@ibm.com  wrote:
 
 David,
 
 Thank you, that is useful. So, would you recommend using a (clean)
>>> field over an external dictionary file? We have lots of "top queries" and
>>> measure their nDCG. A thought was to programmatically generate an external
>>> file where the weight per query term (or phrase) == its nDCG. Bad idea?
 
 Best,
 Audrey
 
 On 1/20/20, 11:51 AM, "David Hastings" <
>>> hastings.recurs...@gmail.com> wrote:
 
   Ive used this quite a bit, my biggest piece of advice is to
>>> choose a field
   that you know is clean, with well defined terms/words, you dont
>>> want an
   autocomplete that has a massive dictionary, also it will make the
   start/reload times pretty slow
 
   On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
   audrey.lorberf...@ibm.com  wrote:
 
> Hi All,
> 
> We plan to incorporate a query autocomplete functionality into our
>>> search
> engine (like this:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
> ). And I was wondering if anyone has personal experience with this
> component and would like to share? Basically, we are just looking
>>> for some
> best practices from more experienced Solr admins so that we have a
>>> starting
> place to launch this in our beta.
> 
> Thank you!
> 
> Best,
> Audrey
> 
 
 
>>> 
>>> 
>>> 
>>> 



Re: Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
David,

True! But we are hoping that these are purely seen as suggestions and that 
people, if they know exactly what they are wanting to type/looking for, will 
simply ignore the dropdown options.

On 1/24/20, 10:03 AM, "David Hastings"  wrote:

This is a really cool idea!  My only concern is that the edge case
searches, where a user knows exactly what they want to find, would be
autocomplete into something that happens to be more "successful" rather
than what they were looking for.  for example, i want to know the legal
implications of jay z's 99 problems.   most of the autocompletes i imagine
would be for the lyrics for the song, or links to the video or jay z
himself, when what im looking for is a line by line analysis of the song
itself and how it relates to the fourth amendment:

https://urldefense.proofpoint.com/v2/url?u=http-3A__pdf.textfiles.com_academics_lj56-2D2-5Fmason-5Farticle.pdf=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=CPAGySYcW7hCqtFtjaThX2vIAhcKEMHHhYpqtqHkx-Q=XEyh7ewstUTlEuyKcYHaTU1vHMYA2-Db_nIYnl89yw4=
 

But in general this is a really clever idea, especially in the retail
arena.  However i suspect your use case is more in research, and after
years of dealing with lawyers and librarians, they tend to not like having
their searches intercepted, they know what they're looking for and they
tend to get mad if you assume they dont :)

On Fri, Jan 24, 2020 at 9:59 AM Lucky Sharma  wrote:

> Hi Audrey,
> As suggested by Erik, you can index the data into a seperate collection 
and
> You can instead of adding weights inthe document you can also use LTR with
> in Solr to rerank on the features.
>
> Regards,
> Lucky Sharma
>
> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
> audrey.lorberf...@ibm.com,
>  wrote:
>
> > Erik,
> >
> > Thank you! Yes, that's exactly how we were thinking of architecting it.
> > And our ML engineer suggested something else for the suggestion weights,
> > actually -- to build a model that would programmatically update the
> weights
> > based on those suggestions' live clicks @ position k, etc. Pretty cool
> > idea...
> >
> >
> >
> > On 1/23/20, 2:26 PM, "Erik Hatcher"  wrote:
> >
> > It's a great idea.   And then index that file into a separate lean
> > collection of just the suggestions, along with the weight as another
> field
> > on those documents, to use for ranking them at query time with standard
> > /select queries.  (this separate suggest collection would also have
> > appropriate tokenization to match the partial words as the user types,
> like
> > ngramming)
> >
> > Erik
> >
> >
> > > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
> > audrey.lorberf...@ibm.com  wrote:
> > >
> > > David,
> > >
> > > Thank you, that is useful. So, would you recommend using a (clean)
> > field over an external dictionary file? We have lots of "top queries" 
and
> > measure their nDCG. A thought was to programmatically generate an
> external
> > file where the weight per query term (or phrase) == its nDCG. Bad idea?
> > >
> > > Best,
> > > Audrey
> > >
> > > On 1/20/20, 11:51 AM, "David Hastings" <
> hastings.recurs...@gmail.com>
> > wrote:
> > >
> > >Ive used this quite a bit, my biggest piece of advice is to
> > choose a field
> > >that you know is clean, with well defined terms/words, you dont
> > want an
> > >autocomplete that has a massive dictionary, also it will make
> the
> > >start/reload times pretty slow
> > >
> > >On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
> > >audrey.lorberf...@ibm.com  wrote:
> > >
> > >> Hi All,
> > >>
> > >> We plan to incorporate a query autocomplete functionality into 
our
> > search
> > >> engine (like this:
> >
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
> > >> ). And I was wondering if anyone has personal experience with 
this
> > >> component and would like to share? Basically, we are just looking
> > for some
> > >> best practices from more experienced Solr admins so that we have 
a
> > starting
> > >> place to launch this in our beta.
> > >>
> > >> Thank you!
> > >>
> > >> Best,
> > >> Audrey
> > >>
> > >
> > >
> >
> >
> >
> >
>




Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Lucky Sharma
Hi Audrey,
As suggested by Erik, you can index the data into a seperate collection and
You can instead of adding weights inthe document you can also use
LTR(Learning to Rank) with in Solr to rerank on the documents.
And also to increase more relevance with in the Autosuggestion and making
positional context of the user in case of Multi token keywords you can also
bigrams/trigrams to generate edge n-grams.



Regards,
Lucky Sharma

On Fri, 24 Jan, 2020, 8:28 pm Lucky Sharma,  wrote:

> Hi Audrey,
> As suggested by Erik, you can index the data into a seperate collection
> and You can instead of adding weights inthe document you can also use LTR
> with in Solr to rerank on the features.
>
> Regards,
> Lucky Sharma
>
> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
> audrey.lorberf...@ibm.com,  wrote:
>
>> Erik,
>>
>> Thank you! Yes, that's exactly how we were thinking of architecting it.
>> And our ML engineer suggested something else for the suggestion weights,
>> actually -- to build a model that would programmatically update the weights
>> based on those suggestions' live clicks @ position k, etc. Pretty cool
>> idea...
>>
>>
>>
>> On 1/23/20, 2:26 PM, "Erik Hatcher"  wrote:
>>
>> It's a great idea.   And then index that file into a separate lean
>> collection of just the suggestions, along with the weight as another field
>> on those documents, to use for ranking them at query time with standard
>> /select queries.  (this separate suggest collection would also have
>> appropriate tokenization to match the partial words as the user types, like
>> ngramming)
>>
>> Erik
>>
>>
>> > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
>> audrey.lorberf...@ibm.com  wrote:
>> >
>> > David,
>> >
>> > Thank you, that is useful. So, would you recommend using a (clean)
>> field over an external dictionary file? We have lots of "top queries" and
>> measure their nDCG. A thought was to programmatically generate an external
>> file where the weight per query term (or phrase) == its nDCG. Bad idea?
>> >
>> > Best,
>> > Audrey
>> >
>> > On 1/20/20, 11:51 AM, "David Hastings" <
>> hastings.recurs...@gmail.com> wrote:
>> >
>> >Ive used this quite a bit, my biggest piece of advice is to
>> choose a field
>> >that you know is clean, with well defined terms/words, you dont
>> want an
>> >autocomplete that has a massive dictionary, also it will make the
>> >start/reload times pretty slow
>> >
>> >On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
>> >audrey.lorberf...@ibm.com  wrote:
>> >
>> >> Hi All,
>> >>
>> >> We plan to incorporate a query autocomplete functionality into our
>> search
>> >> engine (like this:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
>> >> ). And I was wondering if anyone has personal experience with this
>> >> component and would like to share? Basically, we are just looking
>> for some
>> >> best practices from more experienced Solr admins so that we have a
>> starting
>> >> place to launch this in our beta.
>> >>
>> >> Thank you!
>> >>
>> >> Best,
>> >> Audrey
>> >>
>> >
>> >
>>
>>
>>
>>


Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread David Hastings
This is a really cool idea!  My only concern is that the edge case
searches, where a user knows exactly what they want to find, would be
autocomplete into something that happens to be more "successful" rather
than what they were looking for.  for example, i want to know the legal
implications of jay z's 99 problems.   most of the autocompletes i imagine
would be for the lyrics for the song, or links to the video or jay z
himself, when what im looking for is a line by line analysis of the song
itself and how it relates to the fourth amendment:
http://pdf.textfiles.com/academics/lj56-2_mason_article.pdf

But in general this is a really clever idea, especially in the retail
arena.  However i suspect your use case is more in research, and after
years of dealing with lawyers and librarians, they tend to not like having
their searches intercepted, they know what they're looking for and they
tend to get mad if you assume they dont :)

On Fri, Jan 24, 2020 at 9:59 AM Lucky Sharma  wrote:

> Hi Audrey,
> As suggested by Erik, you can index the data into a seperate collection and
> You can instead of adding weights inthe document you can also use LTR with
> in Solr to rerank on the features.
>
> Regards,
> Lucky Sharma
>
> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
> audrey.lorberf...@ibm.com,
>  wrote:
>
> > Erik,
> >
> > Thank you! Yes, that's exactly how we were thinking of architecting it.
> > And our ML engineer suggested something else for the suggestion weights,
> > actually -- to build a model that would programmatically update the
> weights
> > based on those suggestions' live clicks @ position k, etc. Pretty cool
> > idea...
> >
> >
> >
> > On 1/23/20, 2:26 PM, "Erik Hatcher"  wrote:
> >
> > It's a great idea.   And then index that file into a separate lean
> > collection of just the suggestions, along with the weight as another
> field
> > on those documents, to use for ranking them at query time with standard
> > /select queries.  (this separate suggest collection would also have
> > appropriate tokenization to match the partial words as the user types,
> like
> > ngramming)
> >
> > Erik
> >
> >
> > > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
> > audrey.lorberf...@ibm.com  wrote:
> > >
> > > David,
> > >
> > > Thank you, that is useful. So, would you recommend using a (clean)
> > field over an external dictionary file? We have lots of "top queries" and
> > measure their nDCG. A thought was to programmatically generate an
> external
> > file where the weight per query term (or phrase) == its nDCG. Bad idea?
> > >
> > > Best,
> > > Audrey
> > >
> > > On 1/20/20, 11:51 AM, "David Hastings" <
> hastings.recurs...@gmail.com>
> > wrote:
> > >
> > >Ive used this quite a bit, my biggest piece of advice is to
> > choose a field
> > >that you know is clean, with well defined terms/words, you dont
> > want an
> > >autocomplete that has a massive dictionary, also it will make
> the
> > >start/reload times pretty slow
> > >
> > >On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
> > >audrey.lorberf...@ibm.com  wrote:
> > >
> > >> Hi All,
> > >>
> > >> We plan to incorporate a query autocomplete functionality into our
> > search
> > >> engine (like this:
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
> > >> ). And I was wondering if anyone has personal experience with this
> > >> component and would like to share? Basically, we are just looking
> > for some
> > >> best practices from more experienced Solr admins so that we have a
> > starting
> > >> place to launch this in our beta.
> > >>
> > >> Thank you!
> > >>
> > >> Best,
> > >> Audrey
> > >>
> > >
> > >
> >
> >
> >
> >
>


Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Lucky Sharma
Hi Audrey,
As suggested by Erik, you can index the data into a seperate collection and
You can instead of adding weights inthe document you can also use LTR with
in Solr to rerank on the features.

Regards,
Lucky Sharma

On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld - audrey.lorberf...@ibm.com,
 wrote:

> Erik,
>
> Thank you! Yes, that's exactly how we were thinking of architecting it.
> And our ML engineer suggested something else for the suggestion weights,
> actually -- to build a model that would programmatically update the weights
> based on those suggestions' live clicks @ position k, etc. Pretty cool
> idea...
>
>
>
> On 1/23/20, 2:26 PM, "Erik Hatcher"  wrote:
>
> It's a great idea.   And then index that file into a separate lean
> collection of just the suggestions, along with the weight as another field
> on those documents, to use for ranking them at query time with standard
> /select queries.  (this separate suggest collection would also have
> appropriate tokenization to match the partial words as the user types, like
> ngramming)
>
> Erik
>
>
> > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
> >
> > David,
> >
> > Thank you, that is useful. So, would you recommend using a (clean)
> field over an external dictionary file? We have lots of "top queries" and
> measure their nDCG. A thought was to programmatically generate an external
> file where the weight per query term (or phrase) == its nDCG. Bad idea?
> >
> > Best,
> > Audrey
> >
> > On 1/20/20, 11:51 AM, "David Hastings" 
> wrote:
> >
> >Ive used this quite a bit, my biggest piece of advice is to
> choose a field
> >that you know is clean, with well defined terms/words, you dont
> want an
> >autocomplete that has a massive dictionary, also it will make the
> >start/reload times pretty slow
> >
> >On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
> >audrey.lorberf...@ibm.com  wrote:
> >
> >> Hi All,
> >>
> >> We plan to incorporate a query autocomplete functionality into our
> search
> >> engine (like this:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
> >> ). And I was wondering if anyone has personal experience with this
> >> component and would like to share? Basically, we are just looking
> for some
> >> best practices from more experienced Solr admins so that we have a
> starting
> >> place to launch this in our beta.
> >>
> >> Thank you!
> >>
> >> Best,
> >> Audrey
> >>
> >
> >
>
>
>
>


Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi Alessandro,

I'm so happy there is someone who's done extensive work with QAC here! 

Right now, we measure nDCG via a Dynamic Bayesian Network. To break it down, 
we: 
- use a DBN model to generate a "score" for each query_url pair. 
- We then plug that score into a mathematical formula we found in a research 
paper (happy to share the paper if you're interested) for assigning labels 0-4. 
- We then cross-reference the scored & labeled query_url pairs with 1k of our 
system's top queries and 1k of our system's random queries. 
- We use that dataset as our ground truth. 
- We then query the system in real time each day for those 2k queries, label 
them, and compare those labels with our ground truth to get our system's nDCG. 

I hope that makes sense! Lots of steps __

Due to computational overhead reasons, we are pretty committed to using an 
external file & a separate Solr core for our suggestions. We are also planning 
to use the Suggester to add a little human nudge towards "successful" queries. 
I'm not sure whether that's what the Suggester is really meant to do, but we 
are not using it as a naïve prefix-matcher, but more of a query-suggestion 
tool. So, if we know that the query "blue pages" is less successful than the 
query "bluepages" (assuming we can identify the user's intent with this query), 
we will not show suggestions that match "blue pages," instead we will show 
suggestions that match "bluepages." Sort of like a query rewrite, except with 
fuzzy prefix matching, not the introduction of synonyms/expansions.

What we are concerned with currently is how to define a "successful" query. We 
have things like abandonment rate, dwell time, etc., but if you have any advice 
on more ways to identify successful queries, that'd be great. We want to stay 
away from defining success as "popularity," since that will just create a 
closed language system where people only query popular queries, and those 
queries stay popular only because people are querying them (assuming people 
click on the suggestions, of course).

Let me know your thoughts!

On 1/23/20, 10:45 AM, "Alessandro Benedetti"  wrote:

I have been working extensively on query autocompletion, these blogs should
be helpful to you:


https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2015_07_solr-2Dyou-2Dcomplete-2Dme.html=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI=c149I_QBokd35FBMGaUxoBPMViUXAdZtVnkSKTINndE=
 

https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2018_06_apache-2Dlucene-2Dblendedinfixsuggester-2Dhow-2Dit-2Dworks-2Dbugs-2Dand-2Dimprovements.html=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI=m8s2XvI7tR1t9bNaA4SI-w90MdbLZTYxc0mBMz8RMSw=
 

You idea of using search quality evaluation to drive the autocompletion is
interesting.
How do you currently calculate the NDCG for a query? What's your golden
truth?
Using that approach you will autocomplete favouring query completion that
your search engine is able to process better, not necessarily closer to the
user intent, still it could work.

We should differentiate here between the suggester dictionary (where the
suggestions come from, in your case it could be your extracted data) and
the kind of suggestion (that in your case could be the free text suggester
lookup)

Cheers
--
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
www.sease.io


On Mon, 20 Jan 2020 at 17:02, David Hastings 
wrote:

> Not a bad idea at all, however ive never used an external file before, 
just
> a field in the index, so not an area im familiar with
>
> On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
>
> > David,
> >
> > Thank you, that is useful. So, would you recommend using a (clean) field
> > over an external dictionary file? We have lots of "top queries" and
> measure
> > their nDCG. A thought was to programmatically generate an external file
> > where the weight per query term (or phrase) == its nDCG. Bad idea?
> >
> > Best,
> > Audrey
> >
> > On 1/20/20, 11:51 AM, "David Hastings" 
> > wrote:
> >
> > Ive used this quite a bit, my biggest piece of advice is to choose a
> > field
> > that you know is clean, with well defined terms/words, you dont want
> an
> > autocomplete that has a massive dictionary, also it will make the
> > start/reload times pretty slow
> >
> > On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
> > audrey.lorberf...@ibm.com  wrote:
> >
> > > Hi All,
> > >
> > > We plan to incorporate a query autocomplete functionality into our
> > search
> 

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Erik,

Thank you! Yes, that's exactly how we were thinking of architecting it. And our 
ML engineer suggested something else for the suggestion weights, actually -- to 
build a model that would programmatically update the weights based on those 
suggestions' live clicks @ position k, etc. Pretty cool idea... 



On 1/23/20, 2:26 PM, "Erik Hatcher"  wrote:

It's a great idea.   And then index that file into a separate lean 
collection of just the suggestions, along with the weight as another field on 
those documents, to use for ranking them at query time with standard /select 
queries.  (this separate suggest collection would also have appropriate 
tokenization to match the partial words as the user types, like ngramming)

Erik


> On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - 
audrey.lorberf...@ibm.com  wrote:
> 
> David, 
> 
> Thank you, that is useful. So, would you recommend using a (clean) field 
over an external dictionary file? We have lots of "top queries" and measure 
their nDCG. A thought was to programmatically generate an external file where 
the weight per query term (or phrase) == its nDCG. Bad idea?
> 
> Best,
> Audrey
> 
> On 1/20/20, 11:51 AM, "David Hastings"  
wrote:
> 
>Ive used this quite a bit, my biggest piece of advice is to choose a 
field
>that you know is clean, with well defined terms/words, you dont want an
>autocomplete that has a massive dictionary, also it will make the
>start/reload times pretty slow
> 
>On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
>audrey.lorberf...@ibm.com  wrote:
> 
>> Hi All,
>> 
>> We plan to incorporate a query autocomplete functionality into our search
>> engine (like this: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
 
>> ). And I was wondering if anyone has personal experience with this
>> component and would like to share? Basically, we are just looking for 
some
>> best practices from more experienced Solr admins so that we have a 
starting
>> place to launch this in our beta.
>> 
>> Thank you!
>> 
>> Best,
>> Audrey
>> 
> 
> 





Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-23 Thread Erik Hatcher
It's a great idea.   And then index that file into a separate lean collection 
of just the suggestions, along with the weight as another field on those 
documents, to use for ranking them at query time with standard /select queries. 
 (this separate suggest collection would also have appropriate tokenization to 
match the partial words as the user types, like ngramming)

Erik


> On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - audrey.lorberf...@ibm.com 
>  wrote:
> 
> David, 
> 
> Thank you, that is useful. So, would you recommend using a (clean) field over 
> an external dictionary file? We have lots of "top queries" and measure their 
> nDCG. A thought was to programmatically generate an external file where the 
> weight per query term (or phrase) == its nDCG. Bad idea?
> 
> Best,
> Audrey
> 
> On 1/20/20, 11:51 AM, "David Hastings"  wrote:
> 
>Ive used this quite a bit, my biggest piece of advice is to choose a field
>that you know is clean, with well defined terms/words, you dont want an
>autocomplete that has a massive dictionary, also it will make the
>start/reload times pretty slow
> 
>On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
>audrey.lorberf...@ibm.com  wrote:
> 
>> Hi All,
>> 
>> We plan to incorporate a query autocomplete functionality into our search
>> engine (like this: 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
>>  
>> ). And I was wondering if anyone has personal experience with this
>> component and would like to share? Basically, we are just looking for some
>> best practices from more experienced Solr admins so that we have a starting
>> place to launch this in our beta.
>> 
>> Thank you!
>> 
>> Best,
>> Audrey
>> 
> 
> 



Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-23 Thread Alessandro Benedetti
I have been working extensively on query autocompletion, these blogs should
be helpful to you:

https://sease.io/2015/07/solr-you-complete-me.html
https://sease.io/2018/06/apache-lucene-blendedinfixsuggester-how-it-works-bugs-and-improvements.html

You idea of using search quality evaluation to drive the autocompletion is
interesting.
How do you currently calculate the NDCG for a query? What's your golden
truth?
Using that approach you will autocomplete favouring query completion that
your search engine is able to process better, not necessarily closer to the
user intent, still it could work.

We should differentiate here between the suggester dictionary (where the
suggestions come from, in your case it could be your extracted data) and
the kind of suggestion (that in your case could be the free text suggester
lookup)

Cheers
--
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
www.sease.io


On Mon, 20 Jan 2020 at 17:02, David Hastings 
wrote:

> Not a bad idea at all, however ive never used an external file before, just
> a field in the index, so not an area im familiar with
>
> On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
>
> > David,
> >
> > Thank you, that is useful. So, would you recommend using a (clean) field
> > over an external dictionary file? We have lots of "top queries" and
> measure
> > their nDCG. A thought was to programmatically generate an external file
> > where the weight per query term (or phrase) == its nDCG. Bad idea?
> >
> > Best,
> > Audrey
> >
> > On 1/20/20, 11:51 AM, "David Hastings" 
> > wrote:
> >
> > Ive used this quite a bit, my biggest piece of advice is to choose a
> > field
> > that you know is clean, with well defined terms/words, you dont want
> an
> > autocomplete that has a massive dictionary, also it will make the
> > start/reload times pretty slow
> >
> > On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
> > audrey.lorberf...@ibm.com  wrote:
> >
> > > Hi All,
> > >
> > > We plan to incorporate a query autocomplete functionality into our
> > search
> > > engine (like this:
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
> > > ). And I was wondering if anyone has personal experience with this
> > > component and would like to share? Basically, we are just looking
> > for some
> > > best practices from more experienced Solr admins so that we have a
> > starting
> > > place to launch this in our beta.
> > >
> > > Thank you!
> > >
> > > Best,
> > > Audrey
> > >
> >
> >
> >
>


Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-20 Thread David Hastings
Not a bad idea at all, however ive never used an external file before, just
a field in the index, so not an area im familiar with

On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:

> David,
>
> Thank you, that is useful. So, would you recommend using a (clean) field
> over an external dictionary file? We have lots of "top queries" and measure
> their nDCG. A thought was to programmatically generate an external file
> where the weight per query term (or phrase) == its nDCG. Bad idea?
>
> Best,
> Audrey
>
> On 1/20/20, 11:51 AM, "David Hastings" 
> wrote:
>
> Ive used this quite a bit, my biggest piece of advice is to choose a
> field
> that you know is clean, with well defined terms/words, you dont want an
> autocomplete that has a massive dictionary, also it will make the
> start/reload times pretty slow
>
> On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
>
> > Hi All,
> >
> > We plan to incorporate a query autocomplete functionality into our
> search
> > engine (like this:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
> > ). And I was wondering if anyone has personal experience with this
> > component and would like to share? Basically, we are just looking
> for some
> > best practices from more experienced Solr admins so that we have a
> starting
> > place to launch this in our beta.
> >
> > Thank you!
> >
> > Best,
> > Audrey
> >
>
>
>


Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-20 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
David, 

Thank you, that is useful. So, would you recommend using a (clean) field over 
an external dictionary file? We have lots of "top queries" and measure their 
nDCG. A thought was to programmatically generate an external file where the 
weight per query term (or phrase) == its nDCG. Bad idea?

Best,
Audrey

On 1/20/20, 11:51 AM, "David Hastings"  wrote:

Ive used this quite a bit, my biggest piece of advice is to choose a field
that you know is clean, with well defined terms/words, you dont want an
autocomplete that has a massive dictionary, also it will make the
start/reload times pretty slow

On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:

> Hi All,
>
> We plan to incorporate a query autocomplete functionality into our search
> engine (like this: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0=
 
> ). And I was wondering if anyone has personal experience with this
> component and would like to share? Basically, we are just looking for some
> best practices from more experienced Solr admins so that we have a 
starting
> place to launch this in our beta.
>
> Thank you!
>
> Best,
> Audrey
>




Re: Anyone have experience with Query Auto-Suggestor?

2020-01-20 Thread David Hastings
Ive used this quite a bit, my biggest piece of advice is to choose a field
that you know is clean, with well defined terms/words, you dont want an
autocomplete that has a massive dictionary, also it will make the
start/reload times pretty slow

On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:

> Hi All,
>
> We plan to incorporate a query autocomplete functionality into our search
> engine (like this: https://lucene.apache.org/solr/guide/8_1/suggester.html
> ). And I was wondering if anyone has personal experience with this
> component and would like to share? Basically, we are just looking for some
> best practices from more experienced Solr admins so that we have a starting
> place to launch this in our beta.
>
> Thank you!
>
> Best,
> Audrey
>