How do i unsubscribe from this list ??? anyone knows. On Fri, May 21, 2010 at 1:44 PM, Claus Daldorph Nielsen <[email protected]> wrote: > I have checked the discussion and in nutch-site.xml I have added > <property> > <name>metatags.names</name> > <value>title;keywords</value> > </property> > > <property> > <name>query.basic.title.boost</name> > <value>2.0</value> > </property> > > <property> > <name>query.basic.keywords.boost</name> > <value>2.0</value> > </property> > > > I have also included the 'parse-metatags' in plugin.includes. > > > > Claus Daldorph Nielsen > > Theilgaard Mortensen a/s > Niels Hemmingsens gade 9 > 1153 København K > > Tlf: 33448555 > > > > Julien Nioche <[email protected]> > 21-05-2010 13:33 > Please respond to > [email protected] > > > To > [email protected] > cc > > Subject > Re: Parse and index meta tags in Nutch 1.0 > > > > > > > Have you checked the discussion in > http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html? > What have you modified in nutch-site.xml? > > j. > > On 21 May 2010 12:15, Claus Daldorph Nielsen <[email protected]> wrote: > >> Julien, >> >> Thanks it looks much like what I need. I have applied the patch and > added >> the lines to nutch-site.xml and then rebuild the Nutch project. But > still >> I don't see any metatags in my index. Do you have any suggestions to > what >> I might be doing wrong? Perhaps some configuration that I missed? >> >> >> >> Claus Daldorph Nielsen >> >> Theilgaard Mortensen a/s >> Niels Hemmingsens gade 9 >> 1153 København K >> >> Tlf: 33448555 >> >> >> >> Julien Nioche <[email protected]> >> 21-05-2010 09:39 >> Please respond to >> [email protected] >> >> >> To >> [email protected] >> cc >> >> Subject >> Re: Parse and index meta tags in Nutch 1.0 >> >> >> >> >> >> >> Claus, >> >> See https://issues.apache.org/jira/browse/NUTCH-809 and a related >> discussion >> on >> > http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html >> >> Julien >> >> -- >> DigitalPebble Ltd >> http://www.digitalpebble.com >> >> On 21 May 2010 08:26, Claus Daldorph Nielsen <[email protected]> wrote: >> >> > Hi, >> > >> > I am new to Nutch and trying to get Nutch to index meta tags from html >> > pages and store them for searching in Solr. The tags are on this form: >> > <meta name="TITLE" content="Some title" /> >> > <meta name="KEYWORDS" content="Forum, help, build, stuff" /> >> > >> > I would like to store the tags as two different fields in the index. I >> > have tried the example explaining how to create a plugin but the > example >> > is for Nutch 0.9 and only helps me getting started. >> > >> > I think that I should look at : >> > >> > >> >> > $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java >> > >> > and find the line: >> > HTMLMetaProcessor.getMetaTags(metaTags, root, base); >> > >> > But I'm not sure how to go on from here. Any help would be appreciated >> and >> > you are welcome to inform me if you know of an existing plugin that > will >> > index the meta tags. >> > >> > >> > >> > Claus Daldorph Nielsen >> > >> > Theilgaard Mortensen a/s >> >> > > > -- > DigitalPebble Ltd > http://www.digitalpebble.com > >
-- Karol Rybak

