Thanks Markus for your tip. I now tried the "parsechecker" and it works 
perfectly, I can see the "Parse Metadata" info which contains the keywrods and 
description. I then suppose the documentation on the 
wiki http://wiki.apache.org/nutch/IndexMetatags is wrong as it mentions using 
"indexchecker" instead...




________________________________
 From: Markus Jelsma <[email protected]>
To: ML mail <[email protected]> 
Cc: Lewis John Mcgibbney <[email protected]>; [email protected] 
Sent: Thursday, May 3, 2012 9:32 AM
Subject: Re: Indexing meta tags in Nutch 1.4
 
You should see it with the parsechecker tool but not with the indexchecker 
because you don't have an indexing filter plugin included that reads and emits 
what's output but the parse filter. Use the index-metadata plugin.

On Thu, 3 May 2012 00:25:42 -0700 (PDT), ML mail <[email protected]> wrote:
> Dear Lewis,
> 
> Thanks for the README about the parse-metatags plugin. I have now
> double checked and I have the metatags.names property in my
> nutch-site.xml config file as well as the other required properties.
> Still when running "nutch indexchecker URL" I don't see any
> description or keywords fields :( 
> 
> Below I have pasted the relevant parts of my nutch-site.xml config file:
> 
> <property>
>         <name>index.parse.md</name>
>         <value>metatag.description,metatag.keywords</value>
> </property>
> 
> 
> <property>
>         <name>metatags.names</name>
>         <value>description;keywords</value>
> </property>
> 
> 
> <property>
>         <name>plugin.includes</name>
>        
> 
> <value>protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
> </property>
> 
> As far as I know this all looks correct but maybe you can see
> something wrong? or anything else I might check?
> 
> Regards
> 
> 
> 
> ________________________________
>  From: Lewis John Mcgibbney <[email protected]>
> To: [email protected]; ML mail <[email protected]>
> Sent: Wednesday, May 2, 2012 12:49 PM
> Subject: Re: Indexing meta tags in Nutch 1.4
> 
> Hi,
> 
> Please also see the README Julien kindly provided with the
> parse-metatags plugin.
> 
> 
> https://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-metatags/README.txt?view=markup
> 
> I'm hoping there should be enough info to get it working flawlessly.
> Remember, any changes you make to your config files should really be
> recompiled before moving on to a more serious deployment.
> 
> On Tue, May 1, 2012 at 12:38 PM, ML mail <[email protected]> wrote:
>> Hi Lewis,
>> 
>> Thanks to your explanations, I managed to get the parse-metatags plugin 
>> built and installed into the runtime/local/plugins directory. So no I have 
>> the index-metatags from the ZIP file as well as the parse-metatags plugin 
>> from the patch installed and wanted to check if they are working. I followed 
>> step-by-step the guide on http://wiki.apache.org/nutch/IndexMetatags and 
>> came to the part where you check with the "nutch indexchecker URL" command 
>> for the metatag fields. Unfortunately, in the output of that command I don't 
>> see any keywords or description fields :( just the usual ones 
>> (site,title,content,etc).
>> 
>> Am I missing something here?
>> 
>> Also let me know if you need more details or my nutch-site.xml config file...
>> 
>> Regards

-- Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Reply via email to