Julien, Thanks it looks much like what I need. I have applied the patch and added the lines to nutch-site.xml and then rebuild the Nutch project. But still I don't see any metatags in my index. Do you have any suggestions to what I might be doing wrong? Perhaps some configuration that I missed?
Claus Daldorph Nielsen Theilgaard Mortensen a/s Niels Hemmingsens gade 9 1153 København K Tlf: 33448555 Julien Nioche <[email protected]> 21-05-2010 09:39 Please respond to [email protected] To [email protected] cc Subject Re: Parse and index meta tags in Nutch 1.0 Claus, See https://issues.apache.org/jira/browse/NUTCH-809 and a related discussion on http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html Julien -- DigitalPebble Ltd http://www.digitalpebble.com On 21 May 2010 08:26, Claus Daldorph Nielsen <[email protected]> wrote: > Hi, > > I am new to Nutch and trying to get Nutch to index meta tags from html > pages and store them for searching in Solr. The tags are on this form: > <meta name="TITLE" content="Some title" /> > <meta name="KEYWORDS" content="Forum, help, build, stuff" /> > > I would like to store the tags as two different fields in the index. I > have tried the example explaining how to create a plugin but the example > is for Nutch 0.9 and only helps me getting started. > > I think that I should look at : > > $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java > > and find the line: > HTMLMetaProcessor.getMetaTags(metaTags, root, base); > > But I'm not sure how to go on from here. Any help would be appreciated and > you are welcome to inform me if you know of an existing plugin that will > index the meta tags. > > > > Claus Daldorph Nielsen > > Theilgaard Mortensen a/s

