Hi, I am new to Nutch and trying to get Nutch to index meta tags from html pages and store them for searching in Solr. The tags are on this form: <meta name="TITLE" content="Some title" /> <meta name="KEYWORDS" content="Forum, help, build, stuff" />
I would like to store the tags as two different fields in the index. I have tried the example explaining how to create a plugin but the example is for Nutch 0.9 and only helps me getting started. I think that I should look at : $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java and find the line: HTMLMetaProcessor.getMetaTags(metaTags, root, base); But I'm not sure how to go on from here. Any help would be appreciated and you are welcome to inform me if you know of an existing plugin that will index the meta tags. Claus Daldorph Nielsen Theilgaard Mortensen a/s

