Joe Reger, Jr. wrote:

In other words, I'd like to avoid using the command line and instead call
the java classes directly on a scheduled or user-controlled basis from
Tomcat. From what I see in bin/nutch I should be able to replace the
command:
bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
with something like:
net.nutch.tools.CrawlTool crawlTool = new net.nutch.tools.CrawlTool();
String[] args = new String[7];
args[0] = "urls";
args[1] = "-dir";
args[2] = "crawl.test";
args[3] = "-depth";
args[4] = "3";
args[5] = ">&";
args[6] = "crawl.log";
crawlTool.main(args);
Is this possible? Is this smart? What sort of issues will arrise if I try
to run everything from Tomcat/Java?

First of all, it's not only perfectly possible, it's actually how the CrawlTool itself is implemented - please take a look at CrawlTool.main ...


The issues... Well, you need to keep in mind that most Nutch processing tasks consume a lot of resources, so if you run a task in the same JVM instance as the whole app server, then you can exhaust some resource (file handles, heap space, cpu/io, etc) and starve other applications that run on the same JVM.


-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com



Reply via email to