How can I create a good autosuggest list with phrases?

2011-08-04 Thread Shawn Heisey
I'm at the point in my Solr deployment where I want to start using it 
for autosuggest, but I've run into a snag.  Because the fields that I 
want to use for autosuggest are tokenized, I can only get single terms 
out of it.  I would like to have it find common phrases that are between 
two and five words long, so that if someone starts typing ang their 
autosuggest list will include Angelina Jolie as well as possibly Brad 
Pitt and Angelina Jolie.


My index is already quite large, so I do not want to add shingles.  I 
tried to use the clustering component, but that will only give you 
halfway decent results if you make the rows= parameter absolutely huge 
and therefore things run very slowly.  Also, it only works against 
stored fields, so I can only run it against the field where we retrieve 
captions, not the full description.  It's impractical to get results 
based on an entire index, much less all seven shards.


I'm OK with offline analysis to generate a list of suggestions, and I'm 
also OK with doing that analysis against the MySQL data source rather 
than Solr.  I just need some pointers about what software and/or 
techniques I can use to generate a good list, and then some idea of how 
to configure Solr to use that list.  Can anyone help?


Thanks,
Shawn



Re: How can I create a good autosuggest list with phrases?

2011-08-04 Thread Sethi, Parampreet
We handled similar requirement in our product kitchendaily.com by creating a
list of Search terms which were frequently searched over a period of time
and then building auto-suggestion index from this data. The constant updates
of this will allow you to support a well formed auto-suggest feature. This
is a good and faster solution if you have application logs to start with and
not very high volume of data.

Or you can search Solr with the user entered data, which returns all the
matching results and boost the data by field which will be used in
AutoSuggest box, use top 5 items in the dynamic div.

Hope it Helps.

-param


On 8/4/11 11:42 AM, Shawn Heisey s...@elyograg.org wrote:

 I'm at the point in my Solr deployment where I want to start using it
 for autosuggest, but I've run into a snag.  Because the fields that I
 want to use for autosuggest are tokenized, I can only get single terms
 out of it.  I would like to have it find common phrases that are between
 two and five words long, so that if someone starts typing ang their
 autosuggest list will include Angelina Jolie as well as possibly Brad
 Pitt and Angelina Jolie.
 
 My index is already quite large, so I do not want to add shingles.  I
 tried to use the clustering component, but that will only give you
 halfway decent results if you make the rows= parameter absolutely huge
 and therefore things run very slowly.  Also, it only works against
 stored fields, so I can only run it against the field where we retrieve
 captions, not the full description.  It's impractical to get results
 based on an entire index, much less all seven shards.
 
 I'm OK with offline analysis to generate a list of suggestions, and I'm
 also OK with doing that analysis against the MySQL data source rather
 than Solr.  I just need some pointers about what software and/or
 techniques I can use to generate a good list, and then some idea of how
 to configure Solr to use that list.  Can anyone help?
 
 Thanks,
 Shawn
 



Re: How can I create a good autosuggest list with phrases?

2011-08-04 Thread Shawn Heisey

On 8/4/2011 10:04 AM, Sethi, Parampreet wrote:

We handled similar requirement in our product kitchendaily.com by creating a
list of Search terms which were frequently searched over a period of time
and then building auto-suggestion index from this data. The constant updates
of this will allow you to support a well formed auto-suggest feature. This
is a good and faster solution if you have application logs to start with and
not very high volume of data.


I do have some separate plans to include data from our query logs, but 
I'd also like to get data from the index itself, more than one term at a 
time.


Thanks,
Shawn