I suppose you've also tried https://issues.apache.org/jira/browse/NUTCH-783as suggested in the previous discussion?
On 21 May 2010 16:18, Julien Nioche <[email protected]> wrote: > You can : > - run *bin/nutch org.apache.nutch.parse.ParserChecker *and check that you > are getting metatag.* in the parse-metadata > - check in the log that the parse-metatags is really loaded > - run 'ant test-plugins' and see the output in build/parse-metatags > - check that you've added the field definitions in the SOLR schema > - index with Lucene and use Luke to check that the fields are created > > > > On 21 May 2010 15:54, Claus Daldorph Nielsen <[email protected]> wrote: > >> I never got this to work. So if anybody have some ideas for debugging then >> please post your ideas. >> >> The problem is that the meta tags are never found or added to the Solr >> index. I have no idea why. >> >> >> >> Claus Daldorph Nielsen >> >> Theilgaard Mortensen a/s >> Niels Hemmingsens gade 9 >> 1153 København K >> >> Tlf: 33448555 >> >> >> >> Julien Nioche <[email protected]> >> 21-05-2010 13:33 >> Please respond to >> [email protected] >> >> >> To >> [email protected] >> cc >> >> Subject >> Re: Parse and index meta tags in Nutch 1.0 >> >> >> >> >> >> >> Have you checked the discussion in >> http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html >> ? >> What have you modified in nutch-site.xml? >> >> j. >> >> On 21 May 2010 12:15, Claus Daldorph Nielsen <[email protected]> wrote: >> >> > Julien, >> > >> > Thanks it looks much like what I need. I have applied the patch and >> added >> > the lines to nutch-site.xml and then rebuild the Nutch project. But >> still >> > I don't see any metatags in my index. Do you have any suggestions to >> what >> > I might be doing wrong? Perhaps some configuration that I missed? >> > >> > >> > >> > Claus Daldorph Nielsen >> > >> > Theilgaard Mortensen a/s >> > Niels Hemmingsens gade 9 >> > 1153 København K >> > >> > Tlf: 33448555 >> > >> > >> > >> > Julien Nioche <[email protected]> >> > 21-05-2010 09:39 >> > Please respond to >> > [email protected] >> > >> > >> > To >> > [email protected] >> > cc >> > >> > Subject >> > Re: Parse and index meta tags in Nutch 1.0 >> > >> > >> > >> > >> > >> > >> > Claus, >> > >> > See https://issues.apache.org/jira/browse/NUTCH-809 and a related >> > discussion >> > on >> > >> http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html >> > >> > Julien >> > >> > -- >> > DigitalPebble Ltd >> > http://www.digitalpebble.com >> > >> > On 21 May 2010 08:26, Claus Daldorph Nielsen <[email protected]> wrote: >> > >> > > Hi, >> > > >> > > I am new to Nutch and trying to get Nutch to index meta tags from html >> > > pages and store them for searching in Solr. The tags are on this form: >> > > <meta name="TITLE" content="Some title" /> >> > > <meta name="KEYWORDS" content="Forum, help, build, stuff" /> >> > > >> > > I would like to store the tags as two different fields in the index. I >> > > have tried the example explaining how to create a plugin but the >> example >> > > is for Nutch 0.9 and only helps me getting started. >> > > >> > > I think that I should look at : >> > > >> > > >> > >> > >> >> $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java >> > > >> > > and find the line: >> > > HTMLMetaProcessor.getMetaTags(metaTags, root, base); >> > > >> > > But I'm not sure how to go on from here. Any help would be appreciated >> > and >> > > you are welcome to inform me if you know of an existing plugin that >> will >> > > index the meta tags. >> > > >> > > >> > > >> > > Claus Daldorph Nielsen >> > > >> > > Theilgaard Mortensen a/s >> > >> > >> >> >> -- >> DigitalPebble Ltd >> http://www.digitalpebble.com >> >> > > > -- > DigitalPebble Ltd > http://www.digitalpebble.com > -- DigitalPebble Ltd http://www.digitalpebble.com

