Thanks for your answer.
I managed to make it run now. The problem was in the parse-html plugin.
It was missing the dependencies to nekohtml and tagsoup. I added both as
external jars to my environment.
Currently I get the message that my plugin is loaded successfully in
hadoop.log
2012-08-12 16:06:43,712 INFO plugin.PluginRepository - URL Meta
Indexing Filter (simpletestplugin)
However it is never called by the crawler. Neither my 'Test' message is
printed nor does the execution stop if I set a break point within the
filter method of my plugin class.
I didn't see any error message. I also double checked the plugin.xml,
build.xml src/plugin/build.xml and nutch-site.xml and compared all of
them to some existing plugin code. Everything seems to be correct, so I
am basically quite clueless on how to proceed.
Do you have any tips?
Am 12.08.2012 14:01, schrieb Lewis John Mcgibbney:
Please carefully read the xml configuration in the file you have pasted
On Sun, Aug 12, 2012 at 12:11 PM, Alaak <[email protected]> wrote:
<extension id="de.effingo.crawler" name="Some Simple Test Plugin"
point="org.apache.nutch.indexer.IndexingFilter">
<implementation id="page-filter" class="testplugin.SimpleFilter"/>
</extension>
</plugin>
The extension id attribute should equal the package name followed by
your class name. Looking at your Java code this should be
testplugin.SimpleFilter
additionally the implementation id attribute should be SimpleFilter
Do you have the build.xml correctly configured? Have you added the
plugin to plugin.includes property in nutch-site.xml