Re: AutoCompelete

Ard Schrijvers Thu, 25 Nov 2010 02:01:02 -0800

On Thu, Nov 25, 2010 at 9:48 AM, Alexander Klimetschek
<aklim...@adobe.com> wrote:
> On 24.11.10 22:29, "Ard Schrijvers" <a.schrijv...@onehippo.com> wrote:
>
>>On Wed, Nov 24, 2010 at 10:03 PM, Zhou Wu <zwu...@yahoo.com> wrote:
>>> I'm trying to do some thing like
>>> org.apache.jackrabbit.core.query.lucene.spell.SpellChecker for
>>>autocomplete:
>>> When user type in the search input box, a list of words (phrases) that
>>>pops
>>> up like Google suggestion.  I searched on the web and got
>>>
>>>http://stackoverflow.com/questions/120180/how-to-do-query-auto-completion
>>>-suggestions-in-lucene
>>> that looks like helpful. But I don't know how to start to get it work
>>>with
>>> Jackrabbit. Could any one give some tips? Thanks,
>>
>>Afaiu, Spellchecker wouldn't fit auto completion. Auto completion is
>>about suggesting existing terms in the index after you typed, say
>>'jack'.
>
> Exactly, spellcheck is about getting from "jeck" to "jack", but
> autocompletion (in its hardest form) is about getting from typing an "j"
> to a list like "jack, jupiter, jelly, january, ...".
>
> Also there are different use cases as what to show in auto-completion
> (always showing all possibilities doesn't work ;-)) and it is language-
> and region dependent.
>
> Since those few-letter inputs like "j" will be the most frequent ones, as
> people are typing words one-by-one, you want to directly lookup those
> terms from a pre-built index as directly as possible. For this, you can
> have something like "j/ja/jac" in the repository. On each level there is a
> multi-value property containing the auto-completions/suggestions you want
> to show (10 is a good number for example, used by google).


Ah, you suggest to manually keep track of the 'auto-suggest' list,
right? Just read them all in once, have some observer for changes, et
voila. That works, but I wanted to build it differently myself

I want to deliver the feature for us in a different way: Expose the
Lucene term enum as a virtual hierarchical node tree, where every node
is a single letter. This is very efficient, and easy to build once
virtual layers are up&running. The only thing I am struggling with in
my head is about Lucene stemming: the term enum then contains stemmed
words. OTOH, imo, the complete stemming concept in Lucene has been
broken from the start, I never advice stemming. Removing diacritics is
enough. (Lucene 4.0 won't need stemming any more ever, as you can do
everything with fuzzy searches because of a new bleeding edge
automaton query...first upgrade jackrabbit however :-))

Regards Ard

>
> How this index is built in the first time, depends on the use case. For
> example, the Google search shows you terms that are currently popular, so
> they probably update that index based on query statistics like one or two
> times a day. To start, you can use a dictionary, filter out stop words
> like "the", "and" etc. and build that index automatically. Then you only
> get single words - Google also shows full searches, like "jack wolfskin".
> And there are probably many other sources you can build such an index from.
>
> Hope that helps,
> Alex
>
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel
>
>
>
>
>



-- 
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  i...@onehippo.com

Re: AutoCompelete

Reply via email to