You can : - run *bin/nutch org.apache.nutch.parse.ParserChecker *and check that you are getting metatag.* in the parse-metadata - check in the log that the parse-metatags is really loaded - run 'ant test-plugins' and see the output in build/parse-metatags - check that you've added the field definitions in the SOLR schema - index with Lucene and use Luke to check that the fields are created
On 21 May 2010 15:54, Claus Daldorph Nielsen <[email protected]> wrote: > I never got this to work. So if anybody have some ideas for debugging then > please post your ideas. > > The problem is that the meta tags are never found or added to the Solr > index. I have no idea why. > > > > Claus Daldorph Nielsen > > Theilgaard Mortensen a/s > Niels Hemmingsens gade 9 > 1153 København K > > Tlf: 33448555 > > > > Julien Nioche <[email protected]> > 21-05-2010 13:33 > Please respond to > [email protected] > > > To > [email protected] > cc > > Subject > Re: Parse and index meta tags in Nutch 1.0 > > > > > > > Have you checked the discussion in > http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html? > What have you modified in nutch-site.xml? > > j. > > On 21 May 2010 12:15, Claus Daldorph Nielsen <[email protected]> wrote: > > > Julien, > > > > Thanks it looks much like what I need. I have applied the patch and > added > > the lines to nutch-site.xml and then rebuild the Nutch project. But > still > > I don't see any metatags in my index. Do you have any suggestions to > what > > I might be doing wrong? Perhaps some configuration that I missed? > > > > > > > > Claus Daldorph Nielsen > > > > Theilgaard Mortensen a/s > > Niels Hemmingsens gade 9 > > 1153 København K > > > > Tlf: 33448555 > > > > > > > > Julien Nioche <[email protected]> > > 21-05-2010 09:39 > > Please respond to > > [email protected] > > > > > > To > > [email protected] > > cc > > > > Subject > > Re: Parse and index meta tags in Nutch 1.0 > > > > > > > > > > > > > > Claus, > > > > See https://issues.apache.org/jira/browse/NUTCH-809 and a related > > discussion > > on > > > http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html > > > > Julien > > > > -- > > DigitalPebble Ltd > > http://www.digitalpebble.com > > > > On 21 May 2010 08:26, Claus Daldorph Nielsen <[email protected]> wrote: > > > > > Hi, > > > > > > I am new to Nutch and trying to get Nutch to index meta tags from html > > > pages and store them for searching in Solr. The tags are on this form: > > > <meta name="TITLE" content="Some title" /> > > > <meta name="KEYWORDS" content="Forum, help, build, stuff" /> > > > > > > I would like to store the tags as two different fields in the index. I > > > have tried the example explaining how to create a plugin but the > example > > > is for Nutch 0.9 and only helps me getting started. > > > > > > I think that I should look at : > > > > > > > > > > > > $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java > > > > > > and find the line: > > > HTMLMetaProcessor.getMetaTags(metaTags, root, base); > > > > > > But I'm not sure how to go on from here. Any help would be appreciated > > and > > > you are welcome to inform me if you know of an existing plugin that > will > > > index the meta tags. > > > > > > > > > > > > Claus Daldorph Nielsen > > > > > > Theilgaard Mortensen a/s > > > > > > > -- > DigitalPebble Ltd > http://www.digitalpebble.com > > -- DigitalPebble Ltd http://www.digitalpebble.com

