Hi,

This question is more suitable for nutch mailing list but let me give
you couple of pointers.

If its only metadata you can use the below mentioned patch, but if you
want more flexibility with your data you can look at writing your own
parser plugin, here is a good place to start:

http://wiki.apache.org/nutch/WritingPluginExample-0.9

xpath+htmlcleaner+beanshell would be a good set of tools for your custom parser.

regards,
Ram

On Thu, Nov 11, 2010 at 9:21 PM, Jean-Luc <jeanl...@gmail.com> wrote:
>
> I'm going down the route of patching nutch so I can use this ParseMetaTags
> plugin:
> https://issues.apache.org/jira/browse/NUTCH-809
>
> Also wondering whether I will be able to use the XMLParser to allow me to
> parse well formed XHTML, using xpath would be bonus:
> https://issues.apache.org/jira/browse/NUTCH-185
>
> Any thoughts appreciated...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Crawling-with-nutch-and-mapping-fields-to-solr-tp1879060p1883295.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to