Hi,

we use Nutch 2.1 and have a question about the "parsechecker".
We get a very low output, after the command:
bin/nutch parsechecker http://www.zim.uni-wuppertal.de/dienste/netzzugang/funklan/microsoft/xp-intelpro-wireless.html

This is the output of Nutch 2.1:
---------
Url
---------------
http://www.zim.uni-wuppertal.de/dienste/netzzugang/funklan/microsoft/xp-intelpro-wireless.html
---------
Metadata
---------
language :      de

If we use the parsechecker of Nutch 1.x, we got a bigger output with more information!


We use this plugins (nutch-site.xml):

  <property>
    <name>plugin.includes</name>
<value>protocol-httpclient|protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor|static|more|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)|language-identifier|urlmeta|typo3-(accessrootline|base|index-keywords|parse-keywords|sitehash|uid)|headings</value>
  </property>


Why is the output of Nutch 2.1 smaller and can we change it?
Thank you.

Daniel

Reply via email to