> I should mention, that I'm using Nutch in a Web-Application. It's possible though it's hard.
> While debugging I came across the runParser method in ParseUtil class in > which the task.get(MAX_PARSE_TIME, TimeUnit.SECONDS); returns null. See http://wiki.apache.org/nutch/RunNutchInEclipse#Debugging_and_Timeouts (default timeout is 30 sec., you cannot seriously debug within this time) > Therefore i included nutch.jar (which i found in the bin.zip download), i > copied the > following folders to the project workspace: conf, crawl, plugins, runtime What about lib/ and all contained jars? You need all of them. Also libs required by parse plugins are among them. This would explain why fetching succeeds and parsing failed. In general, setting up the class path is not trivial. Have a look at the script bin/nutch and try to construct the path the same way. Or even better (and much easier to develop): run the crawler from your webapp via System.exec() calling a shell script which does the job. To give more detailed help we need more information: - class path - exact call of the crawler Sebastian

