To answer your question, some more information is needed:

1) How do you decide which "topic" a particular page belongs to?  URL
segments?  The Title?  Other html page elements? Latent Semantic Analysis (
http://en.wikipedia.org/wiki/Latent_semantic_indexing)?

2) Given a topic, how will your end users find pages on this topic?
Search?  Link navigation?  Hierarchical categories?

3) If the answer to question 2 was "search", how is your topic search
different from the standard Nutch search?

4) Do you control all the content or the servers hosting the content (like
in an Intranet)?

I ask these because your question, though simply stated, is not necessarily
an easy problem to solve.  Any solution will probably require hooking into
Nutch at several different locations.

Also, I'm curious as to why you want topic based search.  Are you trying to
provide clustered results like Vivisimo (http://vivisimo.com/)?

-- Jim

On 9/14/06, suxiaoke79 <[EMAIL PROTECTED]> wrote:


  I want to realize a topic-based search engine through modifing the
nutch. For example I define a computer topic so I hope that I only find some
information about computer. I can't find the appropriate point where I can
insert myself sentence in Fetcher.java. Please tell me how can I modify
the Fetcher and the parser? thanks.



Reply via email to