You can :
- run *bin/nutch org.apache.nutch.parse.ParserChecker *and check that you
are getting metatag.* in the parse-metadata
- check in the log that the parse-metatags is really loaded
- run 'ant test-plugins' and see the output in build/parse-metatags
- check that you've added the field definitions in the SOLR schema
- index with Lucene and use Luke to check that the fields are created


On 21 May 2010 15:54, Claus Daldorph Nielsen <[email protected]> wrote:

> I never got this to work. So if anybody have some ideas for debugging then
> please post your ideas.
>
> The problem is that the meta tags are never found or added to the Solr
> index. I have no idea why.
>
>
>
> Claus Daldorph Nielsen
>
> Theilgaard Mortensen a/s
> Niels Hemmingsens gade 9
> 1153 København K
>
> Tlf: 33448555
>
>
>
> Julien Nioche <[email protected]>
> 21-05-2010 13:33
> Please respond to
> [email protected]
>
>
> To
> [email protected]
> cc
>
> Subject
> Re: Parse and index meta tags in Nutch 1.0
>
>
>
>
>
>
> Have you checked the discussion in
> http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html?
> What have you modified in nutch-site.xml?
>
> j.
>
> On 21 May 2010 12:15, Claus Daldorph Nielsen <[email protected]> wrote:
>
> > Julien,
> >
> > Thanks it looks much like what I need. I have applied the patch and
> added
> > the lines to nutch-site.xml and then rebuild the Nutch project. But
> still
> > I don't see any metatags in my index. Do you have any suggestions to
> what
> > I might be doing wrong? Perhaps some configuration that I missed?
> >
> >
> >
> > Claus Daldorph Nielsen
> >
> > Theilgaard Mortensen a/s
> > Niels Hemmingsens gade 9
> > 1153 København K
> >
> > Tlf: 33448555
> >
> >
> >
> > Julien Nioche <[email protected]>
> > 21-05-2010 09:39
> > Please respond to
> > [email protected]
> >
> >
> > To
> > [email protected]
> > cc
> >
> > Subject
> > Re: Parse and index meta tags in Nutch 1.0
> >
> >
> >
> >
> >
> >
> > Claus,
> >
> > See https://issues.apache.org/jira/browse/NUTCH-809 and a related
> > discussion
> > on
> >
> http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html
> >
> > Julien
> >
> > --
> > DigitalPebble Ltd
> > http://www.digitalpebble.com
> >
> > On 21 May 2010 08:26, Claus Daldorph Nielsen <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I am new to Nutch and trying to get Nutch to index meta tags from html
> > > pages and store them for searching in Solr. The tags are on this form:
> > > <meta name="TITLE" content="Some title" />
> > > <meta name="KEYWORDS" content="Forum, help, build, stuff" />
> > >
> > > I would like to store the tags as two different fields in the index. I
> > > have tried the example explaining how to create a plugin but the
> example
> > > is for Nutch 0.9 and only helps me getting started.
> > >
> > > I think that I should look at :
> > >
> > >
> >
> >
>
> $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
> > >
> > > and find the line:
> > > HTMLMetaProcessor.getMetaTags(metaTags, root, base);
> > >
> > > But I'm not sure how to go on from here. Any help would be appreciated
> > and
> > > you are welcome to inform me if you know of an existing plugin that
> will
> > > index the meta tags.
> > >
> > >
> > >
> > > Claus Daldorph Nielsen
> > >
> > > Theilgaard Mortensen a/s
> >
> >
>
>
> --
> DigitalPebble Ltd
> http://www.digitalpebble.com
>
>


-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Reply via email to