Julien,

Thanks it looks much like what I need. I have applied the patch and added 
the lines to nutch-site.xml and then rebuild the Nutch project. But still 
I don't see any metatags in my index. Do you have any suggestions to what 
I might be doing wrong? Perhaps some configuration that I missed?



Claus Daldorph Nielsen

Theilgaard Mortensen a/s
Niels Hemmingsens gade 9
1153 København K

Tlf: 33448555



Julien Nioche <[email protected]> 
21-05-2010 09:39
Please respond to
[email protected]


To
[email protected]
cc

Subject
Re: Parse and index meta tags in Nutch 1.0






Claus,

See https://issues.apache.org/jira/browse/NUTCH-809 and a related 
discussion
on 
http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html

Julien

-- 
DigitalPebble Ltd
http://www.digitalpebble.com

On 21 May 2010 08:26, Claus Daldorph Nielsen <[email protected]> wrote:

> Hi,
>
> I am new to Nutch and trying to get Nutch to index meta tags from html
> pages and store them for searching in Solr. The tags are on this form:
> <meta name="TITLE" content="Some title" />
> <meta name="KEYWORDS" content="Forum, help, build, stuff" />
>
> I would like to store the tags as two different fields in the index. I
> have tried the example explaining how to create a plugin but the example
> is for Nutch 0.9 and only helps me getting started.
>
> I think that I should look at :
>
> 
$NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
>
> and find the line:
> HTMLMetaProcessor.getMetaTags(metaTags, root, base);
>
> But I'm not sure how to go on from here. Any help would be appreciated 
and
> you are welcome to inform me if you know of an existing plugin that will
> index the meta tags.
>
>
>
> Claus Daldorph Nielsen
>
> Theilgaard Mortensen a/s

Reply via email to