To answer your question, some more information is needed:

1) How do you decide which "topic" a particular page belongs to?  URL
segments?  The Title?  Other html page elements? Latent Semantic Analysis (
http://en.wikipedia.org/wiki/Latent_semantic_indexing)?

2) Given a topic, how will your end users find pages on this topic?
Search?  Link navigation?  Hierarchical categories?

3) If the answer to question 2 was "search", how is your topic search
different from the standard Nutch search?

4) Do you control all the content or the servers hosting the content (like
in an Intranet)?

I ask these because your question, though simply stated, is not necessarily
an easy problem to solve.  Any solution will probably require hooking into
Nutch at several different locations.

Also, I'm curious as to why you want topic based search.  Are you trying to
provide clustered results like Vivisimo (http://vivisimo.com/)?

-- Jim

On 9/14/06, suxiaoke79 <[EMAIL PROTECTED]> wrote:


  I want to realize a topic-based search engine through modifing the
nutch. For example I define a computer topic so I hope that I only find some
information about computer. I can't find the appropriate point where I can
insert myself sentence in Fetcher.java. Please tell me how can I modify
the Fetcher and the parser? thanks.



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to