I never got this to work. So if anybody have some ideas for debugging then 
please post your ideas.

The problem is that the meta tags are never found or added to the Solr 
index. I have no idea why.



Claus Daldorph Nielsen

Theilgaard Mortensen a/s
Niels Hemmingsens gade 9
1153 København K

Tlf: 33448555



Julien Nioche <[email protected]> 
21-05-2010 13:33
Please respond to
[email protected]


To
[email protected]
cc

Subject
Re: Parse and index meta tags in Nutch 1.0






Have you checked the discussion in
http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html?
What have you modified in nutch-site.xml?

j.

On 21 May 2010 12:15, Claus Daldorph Nielsen <[email protected]> wrote:

> Julien,
>
> Thanks it looks much like what I need. I have applied the patch and 
added
> the lines to nutch-site.xml and then rebuild the Nutch project. But 
still
> I don't see any metatags in my index. Do you have any suggestions to 
what
> I might be doing wrong? Perhaps some configuration that I missed?
>
>
>
> Claus Daldorph Nielsen
>
> Theilgaard Mortensen a/s
> Niels Hemmingsens gade 9
> 1153 København K
>
> Tlf: 33448555
>
>
>
> Julien Nioche <[email protected]>
> 21-05-2010 09:39
> Please respond to
> [email protected]
>
>
> To
> [email protected]
> cc
>
> Subject
> Re: Parse and index meta tags in Nutch 1.0
>
>
>
>
>
>
> Claus,
>
> See https://issues.apache.org/jira/browse/NUTCH-809 and a related
> discussion
> on
> 
http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html
>
> Julien
>
> --
> DigitalPebble Ltd
> http://www.digitalpebble.com
>
> On 21 May 2010 08:26, Claus Daldorph Nielsen <[email protected]> wrote:
>
> > Hi,
> >
> > I am new to Nutch and trying to get Nutch to index meta tags from html
> > pages and store them for searching in Solr. The tags are on this form:
> > <meta name="TITLE" content="Some title" />
> > <meta name="KEYWORDS" content="Forum, help, build, stuff" />
> >
> > I would like to store the tags as two different fields in the index. I
> > have tried the example explaining how to create a plugin but the 
example
> > is for Nutch 0.9 and only helps me getting started.
> >
> > I think that I should look at :
> >
> >
>
> 
$NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
> >
> > and find the line:
> > HTMLMetaProcessor.getMetaTags(metaTags, root, base);
> >
> > But I'm not sure how to go on from here. Any help would be appreciated
> and
> > you are welcome to inform me if you know of an existing plugin that 
will
> > index the meta tags.
> >
> >
> >
> > Claus Daldorph Nielsen
> >
> > Theilgaard Mortensen a/s
>
>


-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Reply via email to