At Netflix, we load the completion lexicon with movie titles, person names, and a few aliases. Even then, we find a few misspellings in our metadata (is it "NWA" or "N.W.A."?). Extracting terms from documents will find a lot of misspellings.
You really do not want to rely on random users to correctly spell things like Ratatouille and Koyaanisqatsi. Trust me. Autocomplete needs to be really fast, so we use a dedicated in-memory index (RAMDirectory) in the front end webapp and also use an HTTP cache in the load balancer. We get at least 25 million autocomplete requests a day, more than 10X the number of search requests. I would plan for 10-15X search traffic. wunder On 12/19/08 10:45 AM, "Grant Ingersoll" <gsing...@apache.org> wrote: > I'd add you probably don't want just the query logs, people may search > for things that aren't in the index, too. Your call as to whether > that is useful or not. Also, have a look at the TermsComponent, as it > will tell you the doc freq for terms. > > On Dec 19, 2008, at 10:08 AM, roberto wrote: > >> Erick, >> >> Thanks this sounds good, i'll try. >> >> Mike, >> >> Could you give more details about query logs? >> >> Thanks >> >> On Fri, Dec 19, 2008 at 12:02 AM, Mike Klaas <mike.kl...@gmail.com> >> wrote: >> >>> >>> On 18-Dec-08, at 10:53 AM, roberto wrote: >>> >>> Erick, >>>> >>>> Thanks for the answer, let me clarify the thing, we would like to >>>> have a >>>> combobox with the terms to guide the user in the search i mean, if >>>> a have >>>> thousands of documents and want to tell them how many documents in >>>> the >>>> base >>>> have the particular word, how can i do that? >>>> >>> >>> Sounds like you want query autocomplete. The best way to do this >>> (including if you want the box filled with some queries), is to use >>> the >>> query logs, not the documents. >>> >>> -Mike >>> >> -- >> "Without love, we are birds with broken wings." >> Morrie > > -------------------------- > Grant Ingersoll > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ