Have you checked the discussion in http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html? What have you modified in nutch-site.xml?
j. On 21 May 2010 12:15, Claus Daldorph Nielsen <[email protected]> wrote: > Julien, > > Thanks it looks much like what I need. I have applied the patch and added > the lines to nutch-site.xml and then rebuild the Nutch project. But still > I don't see any metatags in my index. Do you have any suggestions to what > I might be doing wrong? Perhaps some configuration that I missed? > > > > Claus Daldorph Nielsen > > Theilgaard Mortensen a/s > Niels Hemmingsens gade 9 > 1153 København K > > Tlf: 33448555 > > > > Julien Nioche <[email protected]> > 21-05-2010 09:39 > Please respond to > [email protected] > > > To > [email protected] > cc > > Subject > Re: Parse and index meta tags in Nutch 1.0 > > > > > > > Claus, > > See https://issues.apache.org/jira/browse/NUTCH-809 and a related > discussion > on > http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html > > Julien > > -- > DigitalPebble Ltd > http://www.digitalpebble.com > > On 21 May 2010 08:26, Claus Daldorph Nielsen <[email protected]> wrote: > > > Hi, > > > > I am new to Nutch and trying to get Nutch to index meta tags from html > > pages and store them for searching in Solr. The tags are on this form: > > <meta name="TITLE" content="Some title" /> > > <meta name="KEYWORDS" content="Forum, help, build, stuff" /> > > > > I would like to store the tags as two different fields in the index. I > > have tried the example explaining how to create a plugin but the example > > is for Nutch 0.9 and only helps me getting started. > > > > I think that I should look at : > > > > > > $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java > > > > and find the line: > > HTMLMetaProcessor.getMetaTags(metaTags, root, base); > > > > But I'm not sure how to go on from here. Any help would be appreciated > and > > you are welcome to inform me if you know of an existing plugin that will > > index the meta tags. > > > > > > > > Claus Daldorph Nielsen > > > > Theilgaard Mortensen a/s > > -- DigitalPebble Ltd http://www.digitalpebble.com

