On Wed, Apr 18, 2012 at 10:21 AM, Neil Van Dyke <n...@neilvandyke.org> wrote: > Pedro wrote at 04/17/2012 04:21 PM: > >> My first question is: which kind of kind of data structure should I >> use in order to perform such a quick search? I'm guessing I should >> split my notes' data into words and store each single word in some >> kind of tree. But should I just jam every single word in the memory? >> Is this a common practice? >> > > > I think the subfield you're looking for is called "information retrieval", > and there are textbooks on it. You'll want to use multiple IR techniques > together to get good search results. (Although a simple stemming function > and TFIDF will get you a long way.) Which techniques you choose influences > your choice of data structures / stores.
As a possible place to get started, I wrote a little TF-IDF program in Racket a while back: https://gist.github.com/1157123 , based on a similar one in Clojure. -- sam th sa...@ccs.neu.edu ____________________ Racket Users list: http://lists.racket-lang.org/users