What does your nutch-site.xml look like?  What about your plugin.xml for
your project?

Jim

On Mon, Jul 16, 2012 at 8:09 AM, Emre Çelikten <[email protected]> wrote:

> Hello,
>
> I am developing a Java application that uses Nutch as a Maven dependency.
> I run Nutch jobs from my application in a way just like Nutch itself does,
> by calling them like:
>
> ToolRunner.run(**NutchConfiguration.create(), new Injector(), args);
>
> I have been unable to get it to work because it is not able to find the
> plugins, resulting in "java.lang.RuntimeException: Error in configuring
> object" errors. I have been unsuccessfully trying since the last week. I
> think I have narrowed down the problem enough to ask here.
>
> Here are the details.
>
> I am using Nutch 1.5.
>
> When I run Nutch like this:
>
> ./hadoop jar /apps/nutchjob/apache-nutch-1.**5.job org.apache.nutch.crawl.
> **Injector crawl/crawldb urls/urls
>
> here's what the logs say about plugins:
>
> 2012-07-16 14:19:48,450 INFO  plugin.PluginRepository - Plugins: looking
> in: /hadooptmp/mapred/local/**taskTracker/hduser/jobcache/**
> job_201207161219_0026/jars/**classes/plugins
> 2012-07-16 14:19:48,787 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 012-07-16 14:19:48,787 INFO  plugin.PluginRepository - Registered Plugins:
> 2012-07-16 14:19:48,787 INFO  plugin.PluginRepository -         the nutch
> core extension points (nutch-extensionpoints)
> 2012-07-16 14:19:48,787 INFO  plugin.PluginRepository -         Basic URL
> Normalizer (urlnormalizer-basic)
> 2012-07-16 14:19:48,787 INFO  plugin.PluginRepository -         Html Parse
> Plug-in (parse-html)
> 2012-07-16 14:19:48,787 INFO  plugin.PluginRepository -         Basic
> Indexing Filter (index-basic)
> 2012-07-16 14:19:48,787 INFO  plugin.PluginRepository -         HTTP
> Framework (lib-http)
>
> ...
>
> When I run my own application:
>
> ./hadoop jar /apps/myapp/myapp.jar myapp.MyApp
>
> The logs say:
>
> 2012-07-16 13:13:38,407 WARN  plugin.PluginRepository - Plugins: directory
> not found: plugins
> 2012-07-16 13:13:38,407 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2012-07-16 13:13:38,407 INFO  plugin.PluginRepository - Registered Plugins:
> 2012-07-16 13:13:38,407 INFO  plugin.PluginRepository -         NONE
>
>
> Both are using vanilla Nutch configuration.  Their folder structure is
> almost the same, except Nutch is in lib folder as a jar library and the
> file includes my own class files. Plugins are located under classes/plugins
> in the jar file.
>
> Strangely, in the second case, Hadoop only extracts contents of Nutch
> library jar which does not contain any plugins to jobcache folder. Nothing
> from my own jar file is extracted.
>
> Note that my application is not a MapReduce job itself. My main method
> just makes some arrangements and then calls jobs like Injector, Fetcher
> etc. using ToolRunner. I suspect this might have to do with it. Should I
> make my main class implement Tool interface and then call it with
> ToolRunner, making it a custom version of Crawl class?
>
> This might be more of a Hadoop question than Nutch one, sorry about that.
>
> Also, is it possible for you to distribute default Nutch plugins as a
> Maven dependency jar? Nutch 1.5 is unusable for its standard use case if
> its default plugins are not included, which defeats the purpose of Maven,
> no?
>
> Any help would be really appreciated.
>
> Thanks very much in advance,
>
> Emre
>

Reply via email to