If you control the "temporary links" pages, then just add a robots meta tag. Take a look at http://www.robotstxt.org/wc/meta-user.html to see what your options are.
Jake. -----Original Message----- From: Elwin [mailto:[EMAIL PROTECTED] Sent: Friday, February 10, 2006 4:38 AM To: [email protected] Subject: How to control contents to be indexed? In the process of crawling and indexing, some pages are just used as "temporary links " to the pages I want to index, so how can I control those kinds of pages not being indexed? Or which part of nutch should I extend? ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
