Hi,

Actually I have already done all that, as I followed the Nutch Wiki for this 
purpose: http://wiki.apache.org/nutch/IndexMetatags

Now your suggestion about cleaning my segments as well as solr index then 
re-index is a good idea. Could you just help me on the commands to achieve 
these 3 steps?

Many thanks!



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <[email protected]>
To: [email protected]; ML mail <[email protected]>
Cc: 
Sent: Friday, May 11, 2012 7:55 PM
Subject: Re: Indexing HTML metatags from Nutch into Solr

Hello, I am using index-metatags plugins(I supose that you have index-metatags 
plugins on nutch's plugins folder).
Fist you need to include on nutch-site some like this
|index-(basic|anchor|metatags|more)|
also you need to include the metadata names that you want to index(in this file 
also):
<property>
    <name>metatags.names</name>
    
<value>category;keywords;author;comments;description;subject;last_modified</value>
    <description>For plugin index-metatags: Indicate here the name of the
    html meta tag that should be
    parsed. Use a semicolon separated list if you want multiple
    tags, or use '*' to index all.
    Example: description;keywords;role
</description>
</property>
>I have only 
>this(category;keywords;author;comments;description;subject;last_modified).
after you have to configure your solrindex-mapping like this:
<field dest="subject" source="subject" />
<field dest="description" source="description" />
<field dest="comments" source="comments" />
<field dest="author" source="author"/>
<field dest="keywords" source="keywords" />
<field dest="category" source="category" />
<field dest="lastModified" source="lastModified"/>

I suggest clean your segments and solr index and reindex again.
I think that your problem will be solved with this.

****************************************************************************************

----- Mensaje original -----
De: "ML mail" <[email protected]>
Para: [email protected]
Enviados: Viernes, 11 de Mayo 2012 6:40:36
Asunto: Indexing HTML metatags from Nutch into Solr

Hello,

I am using Nutch 1.4 with Solr 3.6.0 and would like to get the HTML keywords 
and description metatags indexed into Solr. On the Nutch side I have followed 
thehttp://wiki.apache.org/nutch/IndexMetatags to get nutch parsing the 
extracting the metatags (using index-metatags and parse-metatags plugins) but 
now when I run the solrindex they simply don't get indexed. 

In Solr I am using the schema.xml provided by Nutch and have added the 
following fields for the metatags:
 
        <!-- fields for the metatags plugin -->
        <field name="metatag.description" type="text" stored="true" 
indexed="true"/>
        <field name="metatag.keywords" type="text" stored="true" 
indexed="true"/>

and have created a solrindex-mapping.xml file as follow:

<mapping>
<fields>
<field dest="description" source="metatag.description"/>
<field dest="keywords" source="metatag.keywords"/>
</fields>
</mapping>

the rest is pretty much a default install of Solr. So now my question is why 
can't I see the metatags indexed in solr? Did I forget maybe to configure 
something in Solr?

Any suggestions are welcome.

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Reply via email to