To answer your question, some more information is needed:
1) How do you decide which "topic" a particular page belongs to? URL segments? The Title? Other html page elements? Latent Semantic Analysis ( http://en.wikipedia.org/wiki/Latent_semantic_indexing)? 2) Given a topic, how will your end users find pages on this topic? Search? Link navigation? Hierarchical categories? 3) If the answer to question 2 was "search", how is your topic search different from the standard Nutch search? 4) Do you control all the content or the servers hosting the content (like in an Intranet)? I ask these because your question, though simply stated, is not necessarily an easy problem to solve. Any solution will probably require hooking into Nutch at several different locations. Also, I'm curious as to why you want topic based search. Are you trying to provide clustered results like Vivisimo (http://vivisimo.com/)? -- Jim On 9/14/06, suxiaoke79 <[EMAIL PROTECTED]> wrote:
I want to realize a topic-based search engine through modifing the nutch. For example I define a computer topic so I hope that I only find some information about computer. I can't find the appropriate point where I can insert myself sentence in Fetcher.java. Please tell me how can I modify the Fetcher and the parser? thanks.
