[jira] Commented: (SOLR-1316) Create autosuggest component

2010-03-23 Thread jonas stock (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848773#action_12848773
 ] 

jonas stock commented on SOLR-1316:
---

hello. 
i have some trouble to apply this patch. 
can anyone send me a little how to for windows ? 

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2010-03-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844083#action_12844083
 ] 

Grant Ingersoll commented on SOLR-1316:
---

What's the status on this?

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2010-02-15 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833786#action_12833786
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

I would lean towards the latter - complex do-it-all components often suffer 
from creeping featuritis and insufficient testing/maintenance, because there 
are few users that use all their features, and few developers that understand 
how they work. I subscribe to the Unix philosophy - do one thing, and do it 
right, so I think that if we can implement autosuggest that works well from the 
technical POV, then it will become a reliable component that you can combine in 
many creative ways to satisfy different scenarios, of which there are likely 
many more than what you described ...

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2010-02-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833566#action_12833566
 ] 

Jan Høydahl commented on SOLR-1316:
---

Found this excellent article on usability of auto suggest: 
http://blog.twigkit.com/search-suggestions-part-1/

A bit more talking about use cases could help us assert that we're solving the 
right problem :)

One complex use case could be:
* First display at most 0-1 suggestion if some terms are mis-spelled.
* Then display 0-3 suggestions from the categories field, ordered by most docs 
in that category
* Then display 0-10 generic suggestions based on fields "title,keywords" 
ordered by relevancy and tf/idf

Yet another use case (for shopping sites) is:
* First display 0-3 actual products, if there is an exact match on product 
name. These suggestsion are of type "instant result", and must return to 
frontend all data necessary to display a preview of the instant result. 
Clicking this suggestion takes the user directly to product page.
* Then display 0-10 suggestions from an index (separate Solr core?) containing 
actual user queries, with offensive content filtered out, sort by relevancy, 
boost by frequency

One way to back the suggest from user query log is through a full-blown solr 
core, where you in addition to the terms also index meta data such as usage 
frequency. Your auto suggest query could then benefit from all of Solr's power 
in ranking etc.

Do complex scenarios like this belong within one request to one instance of 
autosuggest component? Would be very powerful. Or is it better to require users 
to build a proxy between frontend and Solr which issues multiple requests to 
multiple autosuggest URLs and/or queries to ordinary indices?

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2010-02-09 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831544#action_12831544
 ] 

Shalin Shekhar Mangar commented on SOLR-1316:
-

{quote}Where are we on this - do people feel it's ready to commit?{quote}

It has been some time since I looked at it but I don't feel it is ready. Using 
it through spellcheck works but specifying spell check params feels odd. Also, 
I don't know how well it compares to regular TermsComponent or facet.prefix 
searches in terms of memory and cpu cost.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2010-02-09 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831490#action_12831490
 ] 

Yonik Seeley commented on SOLR-1316:


Where are we on this - do people feel it's ready to commit?
We probably want to add some unit tests too, and some documentation on the wiki 
at some point.

AFAIK, we're limited to one spellcheck component per request handler - that 
should be OK though, since presumably this is meant to be used on it's own, 
right?  What is the recommended/default configuration?  We should probably add 
it as a /autocomplete handler in the example server.

Does this currently work with phrases?

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2010-01-24 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804325#action_12804325
 ] 

David Smiley commented on SOLR-1316:


For auto-complete, I use Solr's faceting via {{facet.prefix}} as described in 
the Solr book which I authored.  How does this approach differ from that?  I 
suspect that the use of radix-trees and what-not as described here will result 
in less memory (RAM) requirements.   It would be interesting to see rough RAM 
requirements for the facet prefix approach based on the number of distinct 
terms... though only a fraction of the term space ends up being auto-completed.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2010-01-04 Thread Brad Giaccio (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796182#action_12796182
 ] 

Brad Giaccio commented on SOLR-1316:


bq. We could do that but as Andrej noted, we'd end up re-implementing a lot of 
its functionality. I'm not sure if it is worth it. I agree that it'd be odd 
using parameters prefixed with "spellcheck" for auto-suggest and it'd have been 
easier if it were vice-versa. Does anybody have a suggestion?

Couldn't you just extend the SpellCheckComponent, and make use of something 
like COMPONENT_NAME or PARAM_PREFIX in the param calls instead of the static 
string in SpellingParams?  That way the autosuggestcomponent would have 
COMPONENT_NAME=autoSuggest and the spelling would have it set to spelling then 
let all the common code just live in the base class.

Just a thought

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-12-15 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791164#action_12791164
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

bq. What about DAWGs? Are we still considering them?

I would be happy to include DAWGs if someone were to implement them ... ;)

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-12-12 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789797#action_12789797
 ] 

Shalin Shekhar Mangar commented on SOLR-1316:
-

{quote}My thinking was that the usual scenario is that you submit autosuggest 
queries soon after user starts typing the query, and the highest perceived 
value of such functionality is when it can suggest complete meaningful phrases 
and not just individual terms. I.e. when you start typing "token sug" it won't 
suggest "token sugar" but instead it will suggest "token suggestions".{quote}

Yes but the decision of selecting the complete phrase or an individual term 
should be up to the user. This is controlled by the "queryAnalyzerFieldType" in 
SpellCheckComponent. We will index tokens returned by that analyzer so the user 
can configure whichever behavior he wants. For example, if it is 
KeywordAnalyzer, we will index/suggest phrases and if it is a 
WhitespaceAnalyzer we will index/suggest individual terms.

{quote}Such as? What you put there is what you get so the fact that we are 
getting complete phrases as suggestions is the consequence of the choice above 
- the trie in this case is populated with phrases. If we populate it with 
tokens, then we can return per-token suggestions, again - losing the added 
value I mentioned above.{quote}

My point was that SpellingResult is too coarse. It is a complete result (for 
all tokens given by "queryAnalyzerFieldType"). If that analyzer gives us 
multiple tokens then we must get suggestions for each. In that case returning a 
SpellingResult for each token is not right. Instead the Suggestor should 
combine suggestions for all tokens into a SpellingResult object. I don't have a 
suggestion on an alternative. Looks like we may need to invent a custom type 
which represents the (suggestion, frequency) pair.

{quote}
For now I'm sure that we do NOT want to use the impl. of RadixTree in this 
patch, because it doesn't support our use case - I'll prepare a patch that 
removes this impl. Other implementations seem comparable wrt. to the speed, 
based on casual tests using /usr/share/dict/words, but I didn't run any exact 
benchmarks yet.
{quote}

OK. Go ahead with the patch and I'll try to find some time to compare the two 
methods. What about DAWGs? Are we still considering them?

{quote}
Shouldn't we be creating a separate AutoSuggestComponent like the 
SpellCheckComponent havings its own prepare, process and inform functions?
{quote}

We could do that but as Andrej noted, we'd end up re-implementing a lot of its 
functionality. I'm not sure if it is worth it. I agree that it'd be odd using 
parameters prefixed with "spellcheck" for auto-suggest and it'd have been 
easier if it were vice-versa. Does anybody have a suggestion?

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-12-11 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789418#action_12789418
 ] 

Ankul Garg commented on SOLR-1316:
--

Shouldn't we be creating a separate AutoSuggestComponent like the 
SpellCheckComponent havings its own prepare, process and inform functions?

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-12-10 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788913#action_12788913
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

Thanks for the review!

bq.  Why do we concatenate all the tokens into one before calling 
Lookup#lookup? It seems we should be getting suggestions for each token just as 
SpellCheckComponent does.

Yeah, it's disputable, and we could change it to use single tokens ... My 
thinking was that the usual scenario is that you submit autosuggest queries 
soon after user starts typing the query, and the highest perceived value of 
such functionality is when it can suggest complete meaningful phrases and not 
just individual terms. I.e. when you start typing "token sug" it won't suggest 
"token sugar" but instead it will suggest "token suggestions".

bq. Related to #1, the Lookup#lookup method should return something more fine 
grained rather than a SpellingResult

Such as? What you put there is what you get ;) so the fact that we are getting 
complete phrases as suggestions is the consequence of the choice above - the 
trie in this case is populated with phrases. If we populate it with tokens, 
then we can return per-token suggestions, again - losing the added value I 
mentioned above.

bq. Has anyone done any benchmarking to figure out the data structure we want 
to go ahead with?

For now I'm sure that we do NOT want to use the impl. of RadixTree in this 
patch, because it doesn't support our use case - I'll prepare a patch that 
removes this impl. Other implementations seem comparable wrt. to the speed, 
based on casual tests using /usr/share/dict/words, but I didn't run any exact 
benchmarks yet.


> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-12-10 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788701#action_12788701
 ] 

Shalin Shekhar Mangar commented on SOLR-1316:
-

I've started looking into the patch. 

# Why do we concatenate all the tokens into one before calling Lookup#lookup? 
It seems we should be getting suggestions for each token just as 
SpellCheckComponent does.
# Related to #1, the Lookup#lookup method should return something more fine 
grained rather than a SpellingResult
# Has anyone done any benchmarking to figure out the data structure we want to 
go ahead with?

I love that we are (ab)using the SpellCheckComponent. The good part is that if 
we go this route, this auto-suggest pseudo-component will automatically work 
with distributed setups.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-11-20 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780530#action_12780530
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

Re: question 1 - currently this component doesn't support populating the 
dictionary from a distributed index.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-11-20 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780485#action_12780485
 ] 

Ankul Garg commented on SOLR-1316:
--

Re: Mike
Am answering your 2nd query. Yes, it is possible to auto-suggest separately for 
separate fields. Create separate NamedList configurations in the solrconfig.xml 
file specifying the fieldname(s) for each configuration. Now, words from the 
solr index will be extracted only from the specified fieldname(s) for each 
configuration and also separate search trees will be created for each 
configuration.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-11-19 Thread Mike Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780435#action_12780435
 ] 

Mike Anderson commented on SOLR-1316:
-

Two questions, and apologies if they are addressed in the patches themselves:

1) Will this be supported on distributed setups?

2) (maybe this is a new ticket) Is it possible to store the field name in the 
spellcheck index. The use case for this is: I create a dictionary from the 
'person' field and the 'title' field, when using autosuggest it would be nice 
to separate suggestions into 'person' suggestions and 'title' suggestions.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-11-16 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778297#action_12778297
 ] 

Ankul Garg commented on SOLR-1316:
--

I couldn't find any way to balance the tree dynamically at each insertion. But 
am trying to figure out some possible way (may be the way binary trees are 
balanced by dynamically modifying the root of the tree). Till then we can 
balance it by adding terms to a List and then inserting as mentioned above. Or 
in case the Dictionary is not sorted and is randomly ordered, then a random 
insertion of strings will also give roughly a balanced tree. We can benchmark 
it both ways. What do you say?

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-11-16 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778295#action_12778295
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

Well, this is kind of ugly, because it increases the memory footprint of the 
build phase - that was the whole point of using Iterator in the Dictionary, so 
that you don't have to cache all dictionary data in memory - dictionaries could 
be large, and they are not guaranteed to be sorted and with unique keys.

But if there are no better options for now, then yes we could do this just in 
TSTLookup. Is there really no way to rebalance the tree?

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-11-16 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778289#action_12778289
 ] 

Ankul Garg commented on SOLR-1316:
--

Re: I think we can first add the terms and frequency in separate ArrayLists 
using the iterator and then start from the middle?

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-11-16 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778280#action_12778280
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

Re: the tree creation - well, this is the current limitation of the Dictionary 
API that provides only an Iterator. So in general case it's not possible to 
start from the middle of the iterator so that the tree is well-balanced. Is it 
possible to re-balance the tree on the fly?

Re: svn - it works for me ...

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-11-16 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778252#action_12778252
 ] 

Ankul Garg commented on SOLR-1316:
--

Andrzej, how are you creating the new patch? The Solr svn server seems to be 
down!!! Tell me asap, got to update the patch.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-11-15 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778167#action_12778167
 ] 

Ankul Garg commented on SOLR-1316:
--

Nice work Andrzej!!! There's a little problem with insertion of tokens in build 
function of TSTLookup class. Strings in HighFrequencyDictionary must be in 
sorted order and simply iterating over the dict and adding strings in sorted 
order in TST will make the tree highly unbalanced. An ordered insertion of 
strings in the same way as one does a binary search over a sorted list will 
make the tree balanced. The function balancedTree of TSTAutocomplete class 
takes care of that.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-11-12 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777045#action_12777045
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

Forgot to add - the RadixTree implementation doesn't work for now - it needs 
further refactoring to return the completed keys, and not just the values 
stored in nodes ...

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-22 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758589#action_12758589
 ] 

Jason Rutherglen commented on SOLR-1316:


Ishan,

Feel free to post your disk TST implementation, it sounds interesting.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-20 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757774#action_12757774
 ] 

Ishan Chattopadhyaya commented on SOLR-1316:


For an extremely fast in-memory lookup table, I saw a TrieMap used in one of my 
projects. In a Trie Map, the nodes of a Hash Map are internally arranged like a 
Trie. The following implementation is very space efficient:
http://airhead-research.googlecode.com/svn/trunk/sspace/src/edu/ucla/sspace/util/

Also, I have a fast memory mapped file based on disk TST implementation. If 
someone things it would be good, I can submit a patch. :-)

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-19 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757688#action_12757688
 ] 

Ankul Garg commented on SOLR-1316:
--

For the purpose of benchmarking alone, I employed DFS just to find the
number of hits for each autocomplete. But to retrieve complete keys, just
create a string key variable in the TST node. In the insert function, where
end boolean has been declared to be true (the termination condition), key
can be assigned the complete string. I will modify the code and post it
soon. May be you can also do the changes as described above.

On Sun, Sep 20, 2009 at 12:11 AM, Andrzej Bialecki (JIRA)



> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: suggest.patch, TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-18 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757592#action_12757592
 ] 

Jason Rutherglen commented on SOLR-1316:


The DAWG seems like a potential fit as a replacement for the
Lucene term dictionary. It would provide the extra benefit of
faster prefix etc lookups. I believe it could be stored on disk
by writing file pointers to the locations of the letters. I
found the Stanford lecture on them interesting, though the
papers seem to overcomplicate them. I coauld not find an existing
Java implementation. 

As a generic library I think it could be useful for a variety of
Lucene based use cases (i.e. storing terms in a compact form
that allows fast lookups, prefix and otherwise). 

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-17 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756819#action_12756819
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

Yes, it should work for now. In fact I started writing a new component, but it 
had to replicate most of the spellchecker ;) so I will just add bits to the 
existing spellchecker. I'm worried though that we abuse the semantics of the 
API, and it will be more difficult to fit both functions in a single API as the 
functionality evolves.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-17 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756785#action_12756785
 ] 

Jason Rutherglen commented on SOLR-1316:


Andrzej,

Is it necessary to create a new abstraction layer?  It looks like the 
SolrSpellChecker abstraction will work?



> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-17 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756605#action_12756605
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

I started working on a skeleton component for this, so that we can test various 
ideas and implementations. Patch is coming shortly.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-17 Thread Andrzej Bialecki

Ishan Chattopadhyaya wrote:

Andrzej, I think a splay tree will adjust itself to prioritize frequent user
queries. What do you think?


Indeed, it will. I'm not sure how splaying affects performance ... from 
Wikipedia article on splay trees: "however it is important to note that 
for uniform access, a splay tree's performance will be considerably 
(although not asymptotically) worse than a somewhat balanced simple 
binary search tree."


I'm working on a patch that hides the implementation behind an abstract 
class. Once we have this skeleton in place we can test various 
implementations.


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: [jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-17 Thread Ishan Chattopadhyaya
Andrzej, I think a splay tree will adjust itself to prioritize frequent user
queries. What do you think?

On Wed, Sep 16, 2009 at 11:45 PM, Andrzej Bialecki (JIRA)
wrote:

>
>[
> https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756149#action_12756149]
>
> Andrzej Bialecki  commented on SOLR-1316:
> -
>
> bq. Andrej, why would immutability be a problem? Wouldn't we have to
> re-build the TST if the source index changes?
>
> Well, the use case I have in mind is a TST that improves itself over time
> based on the observed query log. I.e. you would bootstrap a TST from the
> index (and here indeed you can do this on every searcher refresh), but it's
> often claimed that real query logs provide a far better source of
> autocomplete than the index terms. My idea was to start with what you have -
> in the absence of query logs - and then improve upon it by adding successful
> queries (and removing least-used terms to keep the tree at a more or less
> constant size).
>
> Alternatively we could provide an option to bootstrap it from a real query
> log data.
>
> This use case requires mutability, hence my negative opinion about DAGWs
> (besides, we are lacking an implementation, don't we, whereas we already
> have a few suitable TST implementations). Perhaps this doesn't have to be an
> either/or, if we come up with a pluggable interface for this type of
> component?
>
> bq. I think the building of the data structure can be done in a way similar
> to what SpellCheckComponent does. [..]
>
> +1
>
>
> > Create autosuggest component
> > 
> >
> > Key: SOLR-1316
> > URL: https://issues.apache.org/jira/browse/SOLR-1316
> > Project: Solr
> >  Issue Type: New Feature
> >  Components: search
> >Affects Versions: 1.4
> >Reporter: Jason Rutherglen
> >Priority: Minor
> > Fix For: 1.5
> >
> > Attachments: TernarySearchTree.tar.gz
> >
> >   Original Estimate: 96h
> >  Remaining Estimate: 96h
> >
> > Autosuggest is a common search function that can be integrated
> > into Solr as a SearchComponent. Our first implementation will
> > use the TernaryTree found in Lucene contrib.
> > * Enable creation of the dictionary from the index or via Solr's
> > RPC mechanism
> > * What types of parameters and settings are desirable?
> > * Hopefully in the future we can include user click through
> > rates to boost those terms/phrases higher
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


-- 
---
Ishan Chattopadhyaya
Email: is...@chattopadhyaya.com
Website: www.ishan.chattopadhyaya.com
---


[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-16 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756149#action_12756149
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

bq. Andrej, why would immutability be a problem? Wouldn't we have to re-build 
the TST if the source index changes?

Well, the use case I have in mind is a TST that improves itself over time based 
on the observed query log. I.e. you would bootstrap a TST from the index (and 
here indeed you can do this on every searcher refresh), but it's often claimed 
that real query logs provide a far better source of autocomplete than the index 
terms. My idea was to start with what you have - in the absence of query logs - 
and then improve upon it by adding successful queries (and removing least-used 
terms to keep the tree at a more or less constant size).

Alternatively we could provide an option to bootstrap it from a real query log 
data.

This use case requires mutability, hence my negative opinion about DAGWs 
(besides, we are lacking an implementation, don't we, whereas we already have a 
few suitable TST implementations). Perhaps this doesn't have to be an 
either/or, if we come up with a pluggable interface for this type of component?

bq. I think the building of the data structure can be done in a way similar to 
what SpellCheckComponent does. [..]

+1


> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-16 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756094#action_12756094
 ] 

Shalin Shekhar Mangar commented on SOLR-1316:
-

bq. DAWGs are problematic, because they are essentially immutable once created 
(the cost of insert / delete is very high)

Andrej, why would immutability be a problem? Wouldn't we have to re-build the 
TST if the source index changes?

bq. Also, I think that populating TST from the index would have to be 
discriminative, perhaps based on a threshold

I think the building of the data structure can be done in a way similar to what 
SpellCheckComponent does. We can re-use the HighFrequencyDictionary which can 
give tokens above a certain threshold frequency. The field names to use for 
building the data structure and the analysis can also be done like SCC. The 
response format for this component can also be similar to SCC.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-16 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756072#action_12756072
 ] 

Ankul Garg commented on SOLR-1316:
--

Removing keys shall not affect the balancing of the tree as it can be easily
done by making the boolean end at the leaf as false. Adding keys dynamically
wont really keep the tree balanced in my implementation, as in my
implementation the tree is balanced by ordered insertion of keys. So while
adding more keys, the TST will have to be rebuilt to make it balanced. Will
that be problematic?




> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-16 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756050#action_12756050
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

bq. These enable suffix compression and create much smaller word graphs.

DAWGs are problematic, because they are essentially immutable once created (the 
cost of insert / delete is very high). So I propose to stick to TSTs for now.

Also, I think that populating TST from the index would have to be 
discriminative, perhaps based on a threshold (so that it only adds terms with 
large enough docFreq), and it would be good to adjust the content of the tree 
based on actual queries that return some results (poor man's auto-learning), 
gradually removing least frequent strings to save space.. We could also use as 
a source a field with 1-3 word shingles (no tf, unstored, to save space in the 
source index, with a similar thresholding mechanism).

Ankul, I'm not sure what's the behavior of your implementation when dynamically 
adding / removing keys? Does it still remain balanced?

I also found a MIT-licensed  impl. of radix tree here: 
http://code.google.com/p/radixtree, which looks good too, one spelling mistake 
in the API notwithstanding ;)


> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755039#action_12755039
 ] 

Shalin Shekhar Mangar commented on SOLR-1316:
-

bq. Note, not sure how it compares, but Lucene has a TST implementation in it 
already. 

Yes at org.apache.lucene.analysis.compound.hyphenation.TernaryTree but it uses 
char type as a pointer thus limiting it to around 65K nodes. That will not be 
enough.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-14 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755047#action_12755047
 ] 

Ankul Garg commented on SOLR-1316:
--

Also, Lucene's TST implementation doesn't has any method for autocompletion.
I had problems understanding TST and the Lucene's TST implementation, so I
felt its better to code it myself.




> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-14 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755030#action_12755030
 ] 

Grant Ingersoll commented on SOLR-1316:
---

Note, not sure how it compares, but Lucene has a TST implementation in it 
already.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-12 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754624#action_12754624
 ] 

Jason Rutherglen commented on SOLR-1316:


And a video:
Lecture 25
http://see.stanford.edu/see/lecturelist.aspx?coll=11f4f422-5670-4b4c-889c-008262e09e4e


> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-12 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754623#action_12754623
 ] 

Jason Rutherglen commented on SOLR-1316:


Ankul, sounds good, feel free to post your ternary tree implementation.

There's some other algorithms to think about:
"Incremental Construction of Minimal Acyclic Finite-State Automata"
http://arxiv.org/PS_cache/cs/pdf/0007/0007009v1.pdf

"Directed acyclic word graph"
http://en.wikipedia.org/wiki/Directed_acyclic_word_graph

"worlds fastest scrabble program"
http://www1.cs.columbia.edu/~kathy/cs4701/documents/aj.pdf

These enable suffix compression and create much smaller word graphs.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-12 Thread Ankul Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754505#action_12754505
 ] 

Ankul Garg commented on SOLR-1316:
--

Hi Shalin sir,
Me and my team have successfully benchmarked Ternary Tree and Trie for 
autocomplete, and Ternary Tree gives the best insertion and search time. We 
have started working on creating a patch that can be integrated with Solr as a 
search component. The patch is expected to come soon. :)

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-11 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754482#action_12754482
 ] 

Shalin Shekhar Mangar commented on SOLR-1316:
-

This is a duplicate of SOLR-706.

Since we comments on both, which one should I close? :)

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-11 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754443#action_12754443
 ] 

Jason Rutherglen commented on SOLR-1316:


Basically we need an algorithm that does suffix compression as well?

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-11 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754440#action_12754440
 ] 

Jason Rutherglen commented on SOLR-1316:


Andrzej,

There's the ternary tree which is supposed to be better?  There are other 
algorithms for compressed dictionaries that could be used (off hand I can't 
think of them).  

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-09-11 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754404#action_12754404
 ] 

Andrzej Bialecki  commented on SOLR-1316:
-

Jason, did you make any progress on this? I'm interested in this 
functionality.. I'm not sure tries are the best choice, unless heavily pruned 
they occupy a lot of RAM space. I had some moderate success using ngram based 
method (I reused the spellchecker, with slight modifications) - the method is 
fast and reuses the existing spellchecker index, but precision of lookups is 
not ideal.

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-07-30 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737308#action_12737308
 ] 

Jason Rutherglen commented on SOLR-1316:


Patch coming...

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1316) Create autosuggest component

2009-07-29 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736905#action_12736905
 ] 

Jason Rutherglen commented on SOLR-1316:


An alternative to the TernaryTree which does not offer a
traverse method is a Patricia Trie which conveniently has been
Apache licensed and implemented at:
http://code.google.com/p/patricia-trie/

Further links about Patricia Tries:

* http://en.wikipedia.org/wiki/Radix_tree

* http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA

* http://www.imperialviolet.org/binary/critbit.pdf

> Create autosuggest component
> 
>
> Key: SOLR-1316
> URL: https://issues.apache.org/jira/browse/SOLR-1316
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.