Hi,
we use Nutch 2.1 and have a question about the "parsechecker".
We get a very low output, after the command:
bin/nutch parsechecker
http://www.zim.uni-wuppertal.de/dienste/netzzugang/funklan/microsoft/xp-intelpro-wireless.html
This is the output of Nutch 2.1:
---------
Url
---------------
http://www.zim.uni-wuppertal.de/dienste/netzzugang/funklan/microsoft/xp-intelpro-wireless.html
---------
Metadata
---------
language : de
If we use the parsechecker of Nutch 1.x, we got a bigger output with
more information!
We use this plugins (nutch-site.xml):
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor|static|more|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)|language-identifier|urlmeta|typo3-(accessrootline|base|index-keywords|parse-keywords|sitehash|uid)|headings</value>
</property>
Why is the output of Nutch 2.1 smaller and can we change it?
Thank you.
Daniel