Hi All,
I'm now able to fix the problem. Thank you everyone. The summary of the
problem is as follows:
Problem:
<nutch-source>/build/plugins/<plugin-name-existing>/plugin.xml was
overwritten when I used plugin-name-existing as an id in the
<nutch-source>/src/plugin/<plugin-name-new>/plugin.xml:/plugin[@id]. It
was my mistake, but after I corrected it (change /plugin[@id] to
plugin-name-new), the
<nutch-source>/build/plugins/<plugin-name-existing>/plugin.xml has never
been re-copied by the build script.
Not sure if this is intended.
BTW. I found whitespace typo in the PluginManifestParser.java:187
Current:
LOG.debug("plugin: id=" + id + " name=" + name + " version=" + version
+ " provider=" + providerName + "class=" + pluginClazz);
Correct: (space before class)
LOG.debug("plugin: id=" + id + " name=" + name + " version=" + version
+ " provider=" + providerName + " class=" + pluginClazz);
Regards,
Ake Tangkananond
On 8/3/12 7:56 PM, "Ake Tangkananond" <[email protected]> wrote:
>Hello,
>
>Thank you for a very quick reply. Yes I run it in local mode. And my
>plugin's plugin.xml and parse-image.jar are present in the
>runtime/local/plugins.
>
>I just knew the root cause now. Here is how I find the cause:
>I insert the following code at PluginDescriptor.java line 288 to print out
>all lookup library path
> System.out.println(java.util.Arrays.toString(urls));
>And I see some problem here:
>
> [file:/usr/local/apache-nutch-2.0.0-source/runtime/local/plugins/parse-ht
>m
>l/parse-image.jar]
>
>Figuring out how to gracefully fix it. But if one knows the right fixing
>spot, please give me some light. xD
>
>
>BTW, I'm using IntelliJ IDEA but I don't know how to configure it with the
>Ivy project. Would be great if one could give me hands at iamake at gmail
>dot com ;-)
>
>
>
>Regards,
>Ake Tangkananond
>
>
>
>On 8/3/12 6:59 PM, "Ferdy Galema" <[email protected]> wrote:
>
>>Hi,
>>
>>Some quick pointers: Do you run it in local mode? Is your plugin's
>>plugin.xml and parse-image.jar present in runtime/local/plugins after you
>>build it? Do you use external libraries?
>>
>>Ferdy.
>>
>>On Fri, Aug 3, 2012 at 1:49 PM, Ake Tangkananond <[email protected]>
>>wrote:
>>
>>> Hello,
>>>
>>> I have question on the Nutch 2 plugin implementation.
>>>
>>> I am implementing an image parser. It used to work fine in Nutch 1.5,
>>>but
>>> after I migrate the code to Nutch 2.0, there are some errors which I
>>>spend
>>> several hours with it and I was unable to trace the cause of it yet.
>>>Would
>>> appreciate the insight here in the mailing list.
>>>
>>> While I was parsing the content fetched, I got the following error in
>>>the
>>> logs/hadoop.log
>>> 2012-08-03 18:28:25,304 ERROR parse.ParserFactory -
>>>PluginRuntimeException
>>> org.apache.nutch.plugin.PluginRuntimeException:
>>> java.lang.ClassNotFoundException: <my plugin class name>
>>> at
>>>
>>>org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:16
>>>6
>>>)
>>> at
>>> org.apache.nutch.parse.ParserFactory.getFields(ParserFactory.java:209)
>>> at
>>>org.apache.nutch.parse.ParserJob.getFields(ParserJob.java:191)
>>> at org.apache.nutch.parse.ParserJob.run(ParserJob.java:243)
>>> at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257)
>>> at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304)
>>> Caused by: java.lang.ClassNotFoundException: <my plugin class name>
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>> at
>>>
>>>org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:15
>>>6
>>>)
>>> ... 7 more
>>> 2012-08-03 18:28:25,654 INFO crawl.SignatureFactory - Using Signature
>>> impl:
>>> org.apache.nutch.crawl.MD5Signature
>>>
>>> What I did is that I copied minimal necessary files from other plugin
>>> folders and modify it to what I need. Then I edited nutch-site.xml to
>>> include my plugin, edited parse-plugins.xml to register mimeType. I
>>>added
>>> parse-image into the 2 packageset under <nutch-source>/build.xml, and
>>>added
>>> ant target under deploy and clean in
>>><nutch-source>/src/plugin/build.xml,
>>> then I rebuild all. (These what I did in Nutch 1.5 and it works, but no
>>> luck
>>> for Nutch 2)
>>>
>>> Could you advise what else I miss, or what more information I should
>>> provide. Thank you very much !
>>>
>>>
>>> Regards,
>>> Ake Tangkananond
>>>
>>>
>>>
>
>