You should see it with the parsechecker tool but not with the
indexchecker because you don't have an indexing filter plugin included
that reads and emits what's output but the parse filter. Use the
index-metadata plugin.
On Thu, 3 May 2012 00:25:42 -0700 (PDT), ML mail <[email protected]>
wrote:
Dear Lewis,
Thanks for the README about the parse-metatags plugin. I have now
double checked and I have the metatags.names property in my
nutch-site.xml config file as well as the other required properties.
Still when running "nutch indexchecker URL" I don't see any
description or keywords fields :(
Below I have pasted the relevant parts of my nutch-site.xml config
file:
<property>
<name>index.parse.md</name>
<value>metatag.description,metatag.keywords</value>
</property>
<property>
<name>metatags.names</name>
<value>description;keywords</value>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>
As far as I know this all looks correct but maybe you can see
something wrong? or anything else I might check?
Regards
________________________________
From: Lewis John Mcgibbney <[email protected]>
To: [email protected]; ML mail <[email protected]>
Sent: Wednesday, May 2, 2012 12:49 PM
Subject: Re: Indexing meta tags in Nutch 1.4
Hi,
Please also see the README Julien kindly provided with the
parse-metatags plugin.
https://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-metatags/README.txt?view=markup
I'm hoping there should be enough info to get it working flawlessly.
Remember, any changes you make to your config files should really be
recompiled before moving on to a more serious deployment.
On Tue, May 1, 2012 at 12:38 PM, ML mail <[email protected]> wrote:
Hi Lewis,
Thanks to your explanations, I managed to get the parse-metatags
plugin built and installed into the runtime/local/plugins directory.
So no I have the index-metatags from the ZIP file as well as the
parse-metatags plugin from the patch installed and wanted to check if
they are working. I followed step-by-step the guide
on http://wiki.apache.org/nutch/IndexMetatags and came to the part
where you check with the "nutch indexchecker URL" command for the
metatag fields. Unfortunately, in the output of that command I don't
see any keywords or description fields :( just the usual ones
(site,title,content,etc).
Am I missing something here?
Also let me know if you need more details or my nutch-site.xml
config file...
Regards
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350