Hi Claus, Glad you got it to work. Do you know what the problem was?
BTW you can vote for issues you like in Jira - if enough people find this plugin useful I'll commit it to the trunk J. On 25 May 2010 08:57, Claus Daldorph Nielsen <[email protected]> wrote: > Julien, > > Thank you so much I really appreciate your help. I have now managed to get > Nutch to index meta tags in my Solr index (I am using Luke to verify that > the correct content is in my index). Only thing left now is to find out > how to search and get content from the new fields in Solr. > > > > Claus Daldorph Nielsen > > Theilgaard Mortensen a/s > > > > Julien Nioche <[email protected]> > 21-05-2010 17:18 > Please respond to > [email protected] > > > To > [email protected] > cc > > Subject > Re: Parse and index meta tags in Nutch 1.0 > > > > > > > You can : > - run *bin/nutch org.apache.nutch.parse.ParserChecker *and check that you > are getting metatag.* in the parse-metadata > - check in the log that the parse-metatags is really loaded > - run 'ant test-plugins' and see the output in build/parse-metatags > - check that you've added the field definitions in the SOLR schema > - index with Lucene and use Luke to check that the fields are created > > > On 21 May 2010 15:54, Claus Daldorph Nielsen <[email protected]> wrote: > > > I never got this to work. So if anybody have some ideas for debugging > then > > please post your ideas. > > > > The problem is that the meta tags are never found or added to the Solr > > index. I have no idea why. > > > > > > > > Claus Daldorph Nielsen > > > > Theilgaard Mortensen a/s > > Niels Hemmingsens gade 9 > > 1153 København K > > > > Tlf: 33448555 > > > > > > > > Julien Nioche <[email protected]> > > 21-05-2010 13:33 > > Please respond to > > [email protected] > > > > > > To > > [email protected] > > cc > > > > Subject > > Re: Parse and index meta tags in Nutch 1.0 > > > > > > > > > > > > > > Have you checked the discussion in > > > http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html? > > What have you modified in nutch-site.xml? > > > > j. > > > > On 21 May 2010 12:15, Claus Daldorph Nielsen <[email protected]> wrote: > > > > > Julien, > > > > > > Thanks it looks much like what I need. I have applied the patch and > > added > > > the lines to nutch-site.xml and then rebuild the Nutch project. But > > still > > > I don't see any metatags in my index. Do you have any suggestions to > > what > > > I might be doing wrong? Perhaps some configuration that I missed? > > > > > > > > > > > > Claus Daldorph Nielsen > > > > > > Theilgaard Mortensen a/s > > > Niels Hemmingsens gade 9 > > > 1153 København K > > > > > > Tlf: 33448555 > > > > > > > > > > > > Julien Nioche <[email protected]> > > > 21-05-2010 09:39 > > > Please respond to > > > [email protected] > > > > > > > > > To > > > [email protected] > > > cc > > > > > > Subject > > > Re: Parse and index meta tags in Nutch 1.0 > > > > > > > > > > > > > > > > > > > > > Claus, > > > > > > See https://issues.apache.org/jira/browse/NUTCH-809 and a related > > > discussion > > > on > > > > > > http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html > > > > > > Julien > > > > > > -- > > > DigitalPebble Ltd > > > http://www.digitalpebble.com > > > > > > On 21 May 2010 08:26, Claus Daldorph Nielsen <[email protected]> wrote: > > > > > > > Hi, > > > > > > > > I am new to Nutch and trying to get Nutch to index meta tags from > html > > > > pages and store them for searching in Solr. The tags are on this > form: > > > > <meta name="TITLE" content="Some title" /> > > > > <meta name="KEYWORDS" content="Forum, help, build, stuff" /> > > > > > > > > I would like to store the tags as two different fields in the index. > I > > > > have tried the example explaining how to create a plugin but the > > example > > > > is for Nutch 0.9 and only helps me getting started. > > > > > > > > I think that I should look at : > > > > > > > > > > > > > > > > > > > > $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java > > > > > > > > and find the line: > > > > HTMLMetaProcessor.getMetaTags(metaTags, root, base); > > > > > > > > But I'm not sure how to go on from here. Any help would be > appreciated > > > and > > > > you are welcome to inform me if you know of an existing plugin that > > will > > > > index the meta tags. > > > > > > > > > > > > > > > > Claus Daldorph Nielsen > > > > > > > > Theilgaard Mortensen a/s > > > > > > > > > > > > -- > > DigitalPebble Ltd > > http://www.digitalpebble.com > > > > > > > -- > DigitalPebble Ltd > http://www.digitalpebble.com > > -- DigitalPebble Ltd http://www.digitalpebble.com

