I have been trying to learn the Nutch code base by stepping through the code
in debug mode of Eclipse. However I am unable to understand a piece of code
in the Injector.
When I run the crawl command used for intranet crawling, it successfully
injects urls into the database. When I run standalone Injector, on the same
set of urls it injects nothing, returning null from each pass of
PrefixURLFilter.filter( url )
I saw in an achieve that that the crawl command uses crawl-tool.xml for its
config, where otherwise nutch-site.xml is used. So I made the
nutch-site.xmlfile exactly the same, but this seemed to have no
result. Does anyone know
why?
I apologize for the newb question, but any help would be greatly
appreciated.
-Charlie
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general