What does your nutch-site.xml look like? What about your plugin.xml for your project?
Jim On Mon, Jul 16, 2012 at 8:09 AM, Emre Çelikten <[email protected]> wrote: > Hello, > > I am developing a Java application that uses Nutch as a Maven dependency. > I run Nutch jobs from my application in a way just like Nutch itself does, > by calling them like: > > ToolRunner.run(**NutchConfiguration.create(), new Injector(), args); > > I have been unable to get it to work because it is not able to find the > plugins, resulting in "java.lang.RuntimeException: Error in configuring > object" errors. I have been unsuccessfully trying since the last week. I > think I have narrowed down the problem enough to ask here. > > Here are the details. > > I am using Nutch 1.5. > > When I run Nutch like this: > > ./hadoop jar /apps/nutchjob/apache-nutch-1.**5.job org.apache.nutch.crawl. > **Injector crawl/crawldb urls/urls > > here's what the logs say about plugins: > > 2012-07-16 14:19:48,450 INFO plugin.PluginRepository - Plugins: looking > in: /hadooptmp/mapred/local/**taskTracker/hduser/jobcache/** > job_201207161219_0026/jars/**classes/plugins > 2012-07-16 14:19:48,787 INFO plugin.PluginRepository - Plugin > Auto-activation mode: [true] > 012-07-16 14:19:48,787 INFO plugin.PluginRepository - Registered Plugins: > 2012-07-16 14:19:48,787 INFO plugin.PluginRepository - the nutch > core extension points (nutch-extensionpoints) > 2012-07-16 14:19:48,787 INFO plugin.PluginRepository - Basic URL > Normalizer (urlnormalizer-basic) > 2012-07-16 14:19:48,787 INFO plugin.PluginRepository - Html Parse > Plug-in (parse-html) > 2012-07-16 14:19:48,787 INFO plugin.PluginRepository - Basic > Indexing Filter (index-basic) > 2012-07-16 14:19:48,787 INFO plugin.PluginRepository - HTTP > Framework (lib-http) > > ... > > When I run my own application: > > ./hadoop jar /apps/myapp/myapp.jar myapp.MyApp > > The logs say: > > 2012-07-16 13:13:38,407 WARN plugin.PluginRepository - Plugins: directory > not found: plugins > 2012-07-16 13:13:38,407 INFO plugin.PluginRepository - Plugin > Auto-activation mode: [true] > 2012-07-16 13:13:38,407 INFO plugin.PluginRepository - Registered Plugins: > 2012-07-16 13:13:38,407 INFO plugin.PluginRepository - NONE > > > Both are using vanilla Nutch configuration. Their folder structure is > almost the same, except Nutch is in lib folder as a jar library and the > file includes my own class files. Plugins are located under classes/plugins > in the jar file. > > Strangely, in the second case, Hadoop only extracts contents of Nutch > library jar which does not contain any plugins to jobcache folder. Nothing > from my own jar file is extracted. > > Note that my application is not a MapReduce job itself. My main method > just makes some arrangements and then calls jobs like Injector, Fetcher > etc. using ToolRunner. I suspect this might have to do with it. Should I > make my main class implement Tool interface and then call it with > ToolRunner, making it a custom version of Crawl class? > > This might be more of a Hadoop question than Nutch one, sorry about that. > > Also, is it possible for you to distribute default Nutch plugins as a > Maven dependency jar? Nutch 1.5 is unusable for its standard use case if > its default plugins are not included, which defeats the purpose of Maven, > no? > > Any help would be really appreciated. > > Thanks very much in advance, > > Emre >

