Hi people, I've been banging my head against this problem for two days now. Simply, I want to add a field with the value of a given meta tag.
I've been trying the parse-xml plugin, but that seems that it doesn't work with version 1.0. I've tried the code at http://sujitpal.blogspot.com/2009/07/nutch-getting-my-feet-wet.html and it hasn't worked. I don't even know why. I don't even know if my plugin is being used... or even looked for! Nutch seems to have a infuriating "Fail silently" policy for plugins. I put a System.exit(1) in my filters just to see if my code is even being encountered. It has not in spite of my config telling it to. Here's my config: nutch-site.xml ... <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|metadata</value> </property> ... parse-plugins.xml ... <mimeType name="application/xhtml+xml"> <plugin id="parse-html" /> <plugin id="metadata" /> </mimeType> <mimeType name="text/html"> <plugin id="parse-html" /> <plugin id="metadata" /> </mimeType> <mimeType name="text/sgml"> <plugin id="parse-html" /> <plugin id="metadata" /> </mimeType> <mimeType name="text/xml"> <plugin id="parse-html" /> <plugin id="parse-rss" /> <plugin id="metadata" /> <plugin id="feed" /> </mimeType> ... <alias name="metadata" extension-id="com.example.website.nutch.parsing.MetaTagExtractorParseFilter" /> ... I've also copied the plugin.xml and jar from my build/metadata to the plugins root dir. Nonetheless, Nutch runs and puts data in solr for me. Afaik, Nutch is completely unaware of my plugin despite my config options. Is the some other place I need to tell Nutch to use my plugin? Is there some other approach to do this without having to write a plugin? This does seem like a lot of work to simply get a meta tag into a field. Any help would be appreciated. Sincerely, John Sherwood

