Hi,

I am new to Nutch and trying to get Nutch to index meta tags from html 
pages and store them for searching in Solr. The tags are on this form:
<meta name="TITLE" content="Some title" />
<meta name="KEYWORDS" content="Forum, help, build, stuff" />

I would like to store the tags as two different fields in the index. I 
have tried the example explaining how to create a plugin but the example 
is for Nutch 0.9 and only helps me getting started.

I think that I should look at :
$NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java

and find the line:
HTMLMetaProcessor.getMetaTags(metaTags, root, base);

But I'm not sure how to go on from here. Any help would be appreciated and 
you are welcome to inform me if you know of an existing plugin that will 
index the meta tags.



Claus Daldorph Nielsen

Theilgaard Mortensen a/s

Reply via email to