Hi, Instead of searching for arbitrary terms, I'd like a list of pre-defined terms.
The actual application. Inputs: 1. Wikipedia's list of article titles 2. RSS/Atom Feeds I'll get the permalink URLs from the feeds then fetch/index with Nutch. Output: 1. List of URLs and the Wikipedia articles they contain. Of course with Nutch + Lucene as it is, I can iterate through the list of titles and search for them, but that's not very efficient. Is anyone working on similar applications?
