Hi Claus,

Glad you got it to work. Do you know what the problem was?

BTW you can vote for issues you like in Jira - if enough people find this
plugin useful I'll commit it to the trunk

J.

On 25 May 2010 08:57, Claus Daldorph Nielsen <[email protected]> wrote:

> Julien,
>
> Thank you so much I really appreciate your help. I have now managed to get
> Nutch to index meta tags in my Solr index (I am using Luke to verify that
> the correct content is in my index). Only thing left now is to find out
> how to search and get content from the new fields in Solr.
>
>
>
> Claus Daldorph Nielsen
>
> Theilgaard Mortensen a/s
>
>
>
> Julien Nioche <[email protected]>
> 21-05-2010 17:18
> Please respond to
> [email protected]
>
>
> To
> [email protected]
> cc
>
> Subject
> Re: Parse and index meta tags in Nutch 1.0
>
>
>
>
>
>
> You can :
> - run *bin/nutch org.apache.nutch.parse.ParserChecker *and check that you
> are getting metatag.* in the parse-metadata
> - check in the log that the parse-metatags is really loaded
> - run 'ant test-plugins' and see the output in build/parse-metatags
> - check that you've added the field definitions in the SOLR schema
> - index with Lucene and use Luke to check that the fields are created
>
>
> On 21 May 2010 15:54, Claus Daldorph Nielsen <[email protected]> wrote:
>
> > I never got this to work. So if anybody have some ideas for debugging
> then
> > please post your ideas.
> >
> > The problem is that the meta tags are never found or added to the Solr
> > index. I have no idea why.
> >
> >
> >
> > Claus Daldorph Nielsen
> >
> > Theilgaard Mortensen a/s
> > Niels Hemmingsens gade 9
> > 1153 København K
> >
> > Tlf: 33448555
> >
> >
> >
> > Julien Nioche <[email protected]>
> > 21-05-2010 13:33
> > Please respond to
> > [email protected]
> >
> >
> > To
> > [email protected]
> > cc
> >
> > Subject
> > Re: Parse and index meta tags in Nutch 1.0
> >
> >
> >
> >
> >
> >
> > Have you checked the discussion in
> >
> http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html?
> > What have you modified in nutch-site.xml?
> >
> > j.
> >
> > On 21 May 2010 12:15, Claus Daldorph Nielsen <[email protected]> wrote:
> >
> > > Julien,
> > >
> > > Thanks it looks much like what I need. I have applied the patch and
> > added
> > > the lines to nutch-site.xml and then rebuild the Nutch project. But
> > still
> > > I don't see any metatags in my index. Do you have any suggestions to
> > what
> > > I might be doing wrong? Perhaps some configuration that I missed?
> > >
> > >
> > >
> > > Claus Daldorph Nielsen
> > >
> > > Theilgaard Mortensen a/s
> > > Niels Hemmingsens gade 9
> > > 1153 København K
> > >
> > > Tlf: 33448555
> > >
> > >
> > >
> > > Julien Nioche <[email protected]>
> > > 21-05-2010 09:39
> > > Please respond to
> > > [email protected]
> > >
> > >
> > > To
> > > [email protected]
> > > cc
> > >
> > > Subject
> > > Re: Parse and index meta tags in Nutch 1.0
> > >
> > >
> > >
> > >
> > >
> > >
> > > Claus,
> > >
> > > See https://issues.apache.org/jira/browse/NUTCH-809 and a related
> > > discussion
> > > on
> > >
> >
> http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html
> > >
> > > Julien
> > >
> > > --
> > > DigitalPebble Ltd
> > > http://www.digitalpebble.com
> > >
> > > On 21 May 2010 08:26, Claus Daldorph Nielsen <[email protected]> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am new to Nutch and trying to get Nutch to index meta tags from
> html
> > > > pages and store them for searching in Solr. The tags are on this
> form:
> > > > <meta name="TITLE" content="Some title" />
> > > > <meta name="KEYWORDS" content="Forum, help, build, stuff" />
> > > >
> > > > I would like to store the tags as two different fields in the index.
> I
> > > > have tried the example explaining how to create a plugin but the
> > example
> > > > is for Nutch 0.9 and only helps me getting started.
> > > >
> > > > I think that I should look at :
> > > >
> > > >
> > >
> > >
> >
> >
>
> $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
> > > >
> > > > and find the line:
> > > > HTMLMetaProcessor.getMetaTags(metaTags, root, base);
> > > >
> > > > But I'm not sure how to go on from here. Any help would be
> appreciated
> > > and
> > > > you are welcome to inform me if you know of an existing plugin that
> > will
> > > > index the meta tags.
> > > >
> > > >
> > > >
> > > > Claus Daldorph Nielsen
> > > >
> > > > Theilgaard Mortensen a/s
> > >
> > >
> >
> >
> > --
> > DigitalPebble Ltd
> > http://www.digitalpebble.com
> >
> >
>
>
> --
> DigitalPebble Ltd
> http://www.digitalpebble.com
>
>


-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Reply via email to