[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848773#action_12848773 ] jonas stock commented on SOLR-1316: --- hello. i have some trouble to apply this patch. can anyone send me a little how to for windows ? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844083#action_12844083 ] Grant Ingersoll commented on SOLR-1316: --- What's the status on this? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833786#action_12833786 ] Andrzej Bialecki commented on SOLR-1316: - I would lean towards the latter - complex do-it-all components often suffer from creeping featuritis and insufficient testing/maintenance, because there are few users that use all their features, and few developers that understand how they work. I subscribe to the Unix philosophy - do one thing, and do it right, so I think that if we can implement autosuggest that works well from the technical POV, then it will become a reliable component that you can combine in many creative ways to satisfy different scenarios, of which there are likely many more than what you described ... > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833566#action_12833566 ] Jan Høydahl commented on SOLR-1316: --- Found this excellent article on usability of auto suggest: http://blog.twigkit.com/search-suggestions-part-1/ A bit more talking about use cases could help us assert that we're solving the right problem :) One complex use case could be: * First display at most 0-1 suggestion if some terms are mis-spelled. * Then display 0-3 suggestions from the categories field, ordered by most docs in that category * Then display 0-10 generic suggestions based on fields "title,keywords" ordered by relevancy and tf/idf Yet another use case (for shopping sites) is: * First display 0-3 actual products, if there is an exact match on product name. These suggestsion are of type "instant result", and must return to frontend all data necessary to display a preview of the instant result. Clicking this suggestion takes the user directly to product page. * Then display 0-10 suggestions from an index (separate Solr core?) containing actual user queries, with offensive content filtered out, sort by relevancy, boost by frequency One way to back the suggest from user query log is through a full-blown solr core, where you in addition to the terms also index meta data such as usage frequency. Your auto suggest query could then benefit from all of Solr's power in ranking etc. Do complex scenarios like this belong within one request to one instance of autosuggest component? Would be very powerful. Or is it better to require users to build a proxy between frontend and Solr which issues multiple requests to multiple autosuggest URLs and/or queries to ordinary indices? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831544#action_12831544 ] Shalin Shekhar Mangar commented on SOLR-1316: - {quote}Where are we on this - do people feel it's ready to commit?{quote} It has been some time since I looked at it but I don't feel it is ready. Using it through spellcheck works but specifying spell check params feels odd. Also, I don't know how well it compares to regular TermsComponent or facet.prefix searches in terms of memory and cpu cost. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831490#action_12831490 ] Yonik Seeley commented on SOLR-1316: Where are we on this - do people feel it's ready to commit? We probably want to add some unit tests too, and some documentation on the wiki at some point. AFAIK, we're limited to one spellcheck component per request handler - that should be OK though, since presumably this is meant to be used on it's own, right? What is the recommended/default configuration? We should probably add it as a /autocomplete handler in the example server. Does this currently work with phrases? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804325#action_12804325 ] David Smiley commented on SOLR-1316: For auto-complete, I use Solr's faceting via {{facet.prefix}} as described in the Solr book which I authored. How does this approach differ from that? I suspect that the use of radix-trees and what-not as described here will result in less memory (RAM) requirements. It would be interesting to see rough RAM requirements for the facet prefix approach based on the number of distinct terms... though only a fraction of the term space ends up being auto-completed. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796182#action_12796182 ] Brad Giaccio commented on SOLR-1316: bq. We could do that but as Andrej noted, we'd end up re-implementing a lot of its functionality. I'm not sure if it is worth it. I agree that it'd be odd using parameters prefixed with "spellcheck" for auto-suggest and it'd have been easier if it were vice-versa. Does anybody have a suggestion? Couldn't you just extend the SpellCheckComponent, and make use of something like COMPONENT_NAME or PARAM_PREFIX in the param calls instead of the static string in SpellingParams? That way the autosuggestcomponent would have COMPONENT_NAME=autoSuggest and the spelling would have it set to spelling then let all the common code just live in the base class. Just a thought > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791164#action_12791164 ] Andrzej Bialecki commented on SOLR-1316: - bq. What about DAWGs? Are we still considering them? I would be happy to include DAWGs if someone were to implement them ... ;) > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789797#action_12789797 ] Shalin Shekhar Mangar commented on SOLR-1316: - {quote}My thinking was that the usual scenario is that you submit autosuggest queries soon after user starts typing the query, and the highest perceived value of such functionality is when it can suggest complete meaningful phrases and not just individual terms. I.e. when you start typing "token sug" it won't suggest "token sugar" but instead it will suggest "token suggestions".{quote} Yes but the decision of selecting the complete phrase or an individual term should be up to the user. This is controlled by the "queryAnalyzerFieldType" in SpellCheckComponent. We will index tokens returned by that analyzer so the user can configure whichever behavior he wants. For example, if it is KeywordAnalyzer, we will index/suggest phrases and if it is a WhitespaceAnalyzer we will index/suggest individual terms. {quote}Such as? What you put there is what you get so the fact that we are getting complete phrases as suggestions is the consequence of the choice above - the trie in this case is populated with phrases. If we populate it with tokens, then we can return per-token suggestions, again - losing the added value I mentioned above.{quote} My point was that SpellingResult is too coarse. It is a complete result (for all tokens given by "queryAnalyzerFieldType"). If that analyzer gives us multiple tokens then we must get suggestions for each. In that case returning a SpellingResult for each token is not right. Instead the Suggestor should combine suggestions for all tokens into a SpellingResult object. I don't have a suggestion on an alternative. Looks like we may need to invent a custom type which represents the (suggestion, frequency) pair. {quote} For now I'm sure that we do NOT want to use the impl. of RadixTree in this patch, because it doesn't support our use case - I'll prepare a patch that removes this impl. Other implementations seem comparable wrt. to the speed, based on casual tests using /usr/share/dict/words, but I didn't run any exact benchmarks yet. {quote} OK. Go ahead with the patch and I'll try to find some time to compare the two methods. What about DAWGs? Are we still considering them? {quote} Shouldn't we be creating a separate AutoSuggestComponent like the SpellCheckComponent havings its own prepare, process and inform functions? {quote} We could do that but as Andrej noted, we'd end up re-implementing a lot of its functionality. I'm not sure if it is worth it. I agree that it'd be odd using parameters prefixed with "spellcheck" for auto-suggest and it'd have been easier if it were vice-versa. Does anybody have a suggestion? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789418#action_12789418 ] Ankul Garg commented on SOLR-1316: -- Shouldn't we be creating a separate AutoSuggestComponent like the SpellCheckComponent havings its own prepare, process and inform functions? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788913#action_12788913 ] Andrzej Bialecki commented on SOLR-1316: - Thanks for the review! bq. Why do we concatenate all the tokens into one before calling Lookup#lookup? It seems we should be getting suggestions for each token just as SpellCheckComponent does. Yeah, it's disputable, and we could change it to use single tokens ... My thinking was that the usual scenario is that you submit autosuggest queries soon after user starts typing the query, and the highest perceived value of such functionality is when it can suggest complete meaningful phrases and not just individual terms. I.e. when you start typing "token sug" it won't suggest "token sugar" but instead it will suggest "token suggestions". bq. Related to #1, the Lookup#lookup method should return something more fine grained rather than a SpellingResult Such as? What you put there is what you get ;) so the fact that we are getting complete phrases as suggestions is the consequence of the choice above - the trie in this case is populated with phrases. If we populate it with tokens, then we can return per-token suggestions, again - losing the added value I mentioned above. bq. Has anyone done any benchmarking to figure out the data structure we want to go ahead with? For now I'm sure that we do NOT want to use the impl. of RadixTree in this patch, because it doesn't support our use case - I'll prepare a patch that removes this impl. Other implementations seem comparable wrt. to the speed, based on casual tests using /usr/share/dict/words, but I didn't run any exact benchmarks yet. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788701#action_12788701 ] Shalin Shekhar Mangar commented on SOLR-1316: - I've started looking into the patch. # Why do we concatenate all the tokens into one before calling Lookup#lookup? It seems we should be getting suggestions for each token just as SpellCheckComponent does. # Related to #1, the Lookup#lookup method should return something more fine grained rather than a SpellingResult # Has anyone done any benchmarking to figure out the data structure we want to go ahead with? I love that we are (ab)using the SpellCheckComponent. The good part is that if we go this route, this auto-suggest pseudo-component will automatically work with distributed setups. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780530#action_12780530 ] Andrzej Bialecki commented on SOLR-1316: - Re: question 1 - currently this component doesn't support populating the dictionary from a distributed index. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780485#action_12780485 ] Ankul Garg commented on SOLR-1316: -- Re: Mike Am answering your 2nd query. Yes, it is possible to auto-suggest separately for separate fields. Create separate NamedList configurations in the solrconfig.xml file specifying the fieldname(s) for each configuration. Now, words from the solr index will be extracted only from the specified fieldname(s) for each configuration and also separate search trees will be created for each configuration. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780435#action_12780435 ] Mike Anderson commented on SOLR-1316: - Two questions, and apologies if they are addressed in the patches themselves: 1) Will this be supported on distributed setups? 2) (maybe this is a new ticket) Is it possible to store the field name in the spellcheck index. The use case for this is: I create a dictionary from the 'person' field and the 'title' field, when using autosuggest it would be nice to separate suggestions into 'person' suggestions and 'title' suggestions. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778297#action_12778297 ] Ankul Garg commented on SOLR-1316: -- I couldn't find any way to balance the tree dynamically at each insertion. But am trying to figure out some possible way (may be the way binary trees are balanced by dynamically modifying the root of the tree). Till then we can balance it by adding terms to a List and then inserting as mentioned above. Or in case the Dictionary is not sorted and is randomly ordered, then a random insertion of strings will also give roughly a balanced tree. We can benchmark it both ways. What do you say? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778295#action_12778295 ] Andrzej Bialecki commented on SOLR-1316: - Well, this is kind of ugly, because it increases the memory footprint of the build phase - that was the whole point of using Iterator in the Dictionary, so that you don't have to cache all dictionary data in memory - dictionaries could be large, and they are not guaranteed to be sorted and with unique keys. But if there are no better options for now, then yes we could do this just in TSTLookup. Is there really no way to rebalance the tree? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778289#action_12778289 ] Ankul Garg commented on SOLR-1316: -- Re: I think we can first add the terms and frequency in separate ArrayLists using the iterator and then start from the middle? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778280#action_12778280 ] Andrzej Bialecki commented on SOLR-1316: - Re: the tree creation - well, this is the current limitation of the Dictionary API that provides only an Iterator. So in general case it's not possible to start from the middle of the iterator so that the tree is well-balanced. Is it possible to re-balance the tree on the fly? Re: svn - it works for me ... > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778252#action_12778252 ] Ankul Garg commented on SOLR-1316: -- Andrzej, how are you creating the new patch? The Solr svn server seems to be down!!! Tell me asap, got to update the patch. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778167#action_12778167 ] Ankul Garg commented on SOLR-1316: -- Nice work Andrzej!!! There's a little problem with insertion of tokens in build function of TSTLookup class. Strings in HighFrequencyDictionary must be in sorted order and simply iterating over the dict and adding strings in sorted order in TST will make the tree highly unbalanced. An ordered insertion of strings in the same way as one does a binary search over a sorted list will make the tree balanced. The function balancedTree of TSTAutocomplete class takes care of that. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777045#action_12777045 ] Andrzej Bialecki commented on SOLR-1316: - Forgot to add - the RadixTree implementation doesn't work for now - it needs further refactoring to return the completed keys, and not just the values stored in nodes ... > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758589#action_12758589 ] Jason Rutherglen commented on SOLR-1316: Ishan, Feel free to post your disk TST implementation, it sounds interesting. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757774#action_12757774 ] Ishan Chattopadhyaya commented on SOLR-1316: For an extremely fast in-memory lookup table, I saw a TrieMap used in one of my projects. In a Trie Map, the nodes of a Hash Map are internally arranged like a Trie. The following implementation is very space efficient: http://airhead-research.googlecode.com/svn/trunk/sspace/src/edu/ucla/sspace/util/ Also, I have a fast memory mapped file based on disk TST implementation. If someone things it would be good, I can submit a patch. :-) > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, TST.zip > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757688#action_12757688 ] Ankul Garg commented on SOLR-1316: -- For the purpose of benchmarking alone, I employed DFS just to find the number of hits for each autocomplete. But to retrieve complete keys, just create a string key variable in the TST node. In the insert function, where end boolean has been declared to be true (the termination condition), key can be assigned the complete string. I will modify the code and post it soon. May be you can also do the changes as described above. On Sun, Sep 20, 2009 at 12:11 AM, Andrzej Bialecki (JIRA) > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: suggest.patch, TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757592#action_12757592 ] Jason Rutherglen commented on SOLR-1316: The DAWG seems like a potential fit as a replacement for the Lucene term dictionary. It would provide the extra benefit of faster prefix etc lookups. I believe it could be stored on disk by writing file pointers to the locations of the letters. I found the Stanford lecture on them interesting, though the papers seem to overcomplicate them. I coauld not find an existing Java implementation. As a generic library I think it could be useful for a variety of Lucene based use cases (i.e. storing terms in a compact form that allows fast lookups, prefix and otherwise). > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756819#action_12756819 ] Andrzej Bialecki commented on SOLR-1316: - Yes, it should work for now. In fact I started writing a new component, but it had to replicate most of the spellchecker ;) so I will just add bits to the existing spellchecker. I'm worried though that we abuse the semantics of the API, and it will be more difficult to fit both functions in a single API as the functionality evolves. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756785#action_12756785 ] Jason Rutherglen commented on SOLR-1316: Andrzej, Is it necessary to create a new abstraction layer? It looks like the SolrSpellChecker abstraction will work? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756605#action_12756605 ] Andrzej Bialecki commented on SOLR-1316: - I started working on a skeleton component for this, so that we can test various ideas and implementations. Patch is coming shortly. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-1316) Create autosuggest component
Ishan Chattopadhyaya wrote: Andrzej, I think a splay tree will adjust itself to prioritize frequent user queries. What do you think? Indeed, it will. I'm not sure how splaying affects performance ... from Wikipedia article on splay trees: "however it is important to note that for uniform access, a splay tree's performance will be considerably (although not asymptotically) worse than a somewhat balanced simple binary search tree." I'm working on a patch that hides the implementation behind an abstract class. Once we have this skeleton in place we can test various implementations. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: [jira] Commented: (SOLR-1316) Create autosuggest component
Andrzej, I think a splay tree will adjust itself to prioritize frequent user queries. What do you think? On Wed, Sep 16, 2009 at 11:45 PM, Andrzej Bialecki (JIRA) wrote: > >[ > https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756149#action_12756149] > > Andrzej Bialecki commented on SOLR-1316: > - > > bq. Andrej, why would immutability be a problem? Wouldn't we have to > re-build the TST if the source index changes? > > Well, the use case I have in mind is a TST that improves itself over time > based on the observed query log. I.e. you would bootstrap a TST from the > index (and here indeed you can do this on every searcher refresh), but it's > often claimed that real query logs provide a far better source of > autocomplete than the index terms. My idea was to start with what you have - > in the absence of query logs - and then improve upon it by adding successful > queries (and removing least-used terms to keep the tree at a more or less > constant size). > > Alternatively we could provide an option to bootstrap it from a real query > log data. > > This use case requires mutability, hence my negative opinion about DAGWs > (besides, we are lacking an implementation, don't we, whereas we already > have a few suitable TST implementations). Perhaps this doesn't have to be an > either/or, if we come up with a pluggable interface for this type of > component? > > bq. I think the building of the data structure can be done in a way similar > to what SpellCheckComponent does. [..] > > +1 > > > > Create autosuggest component > > > > > > Key: SOLR-1316 > > URL: https://issues.apache.org/jira/browse/SOLR-1316 > > Project: Solr > > Issue Type: New Feature > > Components: search > >Affects Versions: 1.4 > >Reporter: Jason Rutherglen > >Priority: Minor > > Fix For: 1.5 > > > > Attachments: TernarySearchTree.tar.gz > > > > Original Estimate: 96h > > Remaining Estimate: 96h > > > > Autosuggest is a common search function that can be integrated > > into Solr as a SearchComponent. Our first implementation will > > use the TernaryTree found in Lucene contrib. > > * Enable creation of the dictionary from the index or via Solr's > > RPC mechanism > > * What types of parameters and settings are desirable? > > * Hopefully in the future we can include user click through > > rates to boost those terms/phrases higher > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > -- --- Ishan Chattopadhyaya Email: is...@chattopadhyaya.com Website: www.ishan.chattopadhyaya.com ---
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756149#action_12756149 ] Andrzej Bialecki commented on SOLR-1316: - bq. Andrej, why would immutability be a problem? Wouldn't we have to re-build the TST if the source index changes? Well, the use case I have in mind is a TST that improves itself over time based on the observed query log. I.e. you would bootstrap a TST from the index (and here indeed you can do this on every searcher refresh), but it's often claimed that real query logs provide a far better source of autocomplete than the index terms. My idea was to start with what you have - in the absence of query logs - and then improve upon it by adding successful queries (and removing least-used terms to keep the tree at a more or less constant size). Alternatively we could provide an option to bootstrap it from a real query log data. This use case requires mutability, hence my negative opinion about DAGWs (besides, we are lacking an implementation, don't we, whereas we already have a few suitable TST implementations). Perhaps this doesn't have to be an either/or, if we come up with a pluggable interface for this type of component? bq. I think the building of the data structure can be done in a way similar to what SpellCheckComponent does. [..] +1 > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756094#action_12756094 ] Shalin Shekhar Mangar commented on SOLR-1316: - bq. DAWGs are problematic, because they are essentially immutable once created (the cost of insert / delete is very high) Andrej, why would immutability be a problem? Wouldn't we have to re-build the TST if the source index changes? bq. Also, I think that populating TST from the index would have to be discriminative, perhaps based on a threshold I think the building of the data structure can be done in a way similar to what SpellCheckComponent does. We can re-use the HighFrequencyDictionary which can give tokens above a certain threshold frequency. The field names to use for building the data structure and the analysis can also be done like SCC. The response format for this component can also be similar to SCC. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756072#action_12756072 ] Ankul Garg commented on SOLR-1316: -- Removing keys shall not affect the balancing of the tree as it can be easily done by making the boolean end at the leaf as false. Adding keys dynamically wont really keep the tree balanced in my implementation, as in my implementation the tree is balanced by ordered insertion of keys. So while adding more keys, the TST will have to be rebuilt to make it balanced. Will that be problematic? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756050#action_12756050 ] Andrzej Bialecki commented on SOLR-1316: - bq. These enable suffix compression and create much smaller word graphs. DAWGs are problematic, because they are essentially immutable once created (the cost of insert / delete is very high). So I propose to stick to TSTs for now. Also, I think that populating TST from the index would have to be discriminative, perhaps based on a threshold (so that it only adds terms with large enough docFreq), and it would be good to adjust the content of the tree based on actual queries that return some results (poor man's auto-learning), gradually removing least frequent strings to save space.. We could also use as a source a field with 1-3 word shingles (no tf, unstored, to save space in the source index, with a similar thresholding mechanism). Ankul, I'm not sure what's the behavior of your implementation when dynamically adding / removing keys? Does it still remain balanced? I also found a MIT-licensed impl. of radix tree here: http://code.google.com/p/radixtree, which looks good too, one spelling mistake in the API notwithstanding ;) > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755039#action_12755039 ] Shalin Shekhar Mangar commented on SOLR-1316: - bq. Note, not sure how it compares, but Lucene has a TST implementation in it already. Yes at org.apache.lucene.analysis.compound.hyphenation.TernaryTree but it uses char type as a pointer thus limiting it to around 65K nodes. That will not be enough. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755047#action_12755047 ] Ankul Garg commented on SOLR-1316: -- Also, Lucene's TST implementation doesn't has any method for autocompletion. I had problems understanding TST and the Lucene's TST implementation, so I felt its better to code it myself. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755030#action_12755030 ] Grant Ingersoll commented on SOLR-1316: --- Note, not sure how it compares, but Lucene has a TST implementation in it already. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Attachments: TernarySearchTree.tar.gz > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754624#action_12754624 ] Jason Rutherglen commented on SOLR-1316: And a video: Lecture 25 http://see.stanford.edu/see/lecturelist.aspx?coll=11f4f422-5670-4b4c-889c-008262e09e4e > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754623#action_12754623 ] Jason Rutherglen commented on SOLR-1316: Ankul, sounds good, feel free to post your ternary tree implementation. There's some other algorithms to think about: "Incremental Construction of Minimal Acyclic Finite-State Automata" http://arxiv.org/PS_cache/cs/pdf/0007/0007009v1.pdf "Directed acyclic word graph" http://en.wikipedia.org/wiki/Directed_acyclic_word_graph "worlds fastest scrabble program" http://www1.cs.columbia.edu/~kathy/cs4701/documents/aj.pdf These enable suffix compression and create much smaller word graphs. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754505#action_12754505 ] Ankul Garg commented on SOLR-1316: -- Hi Shalin sir, Me and my team have successfully benchmarked Ternary Tree and Trie for autocomplete, and Ternary Tree gives the best insertion and search time. We have started working on creating a patch that can be integrated with Solr as a search component. The patch is expected to come soon. :) > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754482#action_12754482 ] Shalin Shekhar Mangar commented on SOLR-1316: - This is a duplicate of SOLR-706. Since we comments on both, which one should I close? :) > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754443#action_12754443 ] Jason Rutherglen commented on SOLR-1316: Basically we need an algorithm that does suffix compression as well? > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754440#action_12754440 ] Jason Rutherglen commented on SOLR-1316: Andrzej, There's the ternary tree which is supposed to be better? There are other algorithms for compressed dictionaries that could be used (off hand I can't think of them). > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754404#action_12754404 ] Andrzej Bialecki commented on SOLR-1316: - Jason, did you make any progress on this? I'm interested in this functionality.. I'm not sure tries are the best choice, unless heavily pruned they occupy a lot of RAM space. I had some moderate success using ngram based method (I reused the spellchecker, with slight modifications) - the method is fast and reuses the existing spellchecker index, but precision of lookups is not ideal. > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737308#action_12737308 ] Jason Rutherglen commented on SOLR-1316: Patch coming... > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1316) Create autosuggest component
[ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736905#action_12736905 ] Jason Rutherglen commented on SOLR-1316: An alternative to the TernaryTree which does not offer a traverse method is a Patricia Trie which conveniently has been Apache licensed and implemented at: http://code.google.com/p/patricia-trie/ Further links about Patricia Tries: * http://en.wikipedia.org/wiki/Radix_tree * http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA * http://www.imperialviolet.org/binary/critbit.pdf > Create autosuggest component > > > Key: SOLR-1316 > URL: https://issues.apache.org/jira/browse/SOLR-1316 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.4 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 1.5 > > Original Estimate: 96h > Remaining Estimate: 96h > > Autosuggest is a common search function that can be integrated > into Solr as a SearchComponent. Our first implementation will > use the TernaryTree found in Lucene contrib. > * Enable creation of the dictionary from the index or via Solr's > RPC mechanism > * What types of parameters and settings are desirable? > * Hopefully in the future we can include user click through > rates to boost those terms/phrases higher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.