How to limit the pages number processed from each domain? And how to setup nutch to crawl only domains added by me (i.e. make nutch to ignore external links)? If nutch doesn't allow it then what algorithm will be the best for it?
p.s. nutch ver.0.7 ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
