Hi Daniel

I see that the Nutch 1.x parsechecker command dump the ParseData, but in
nutch 2.x it only dump the MetaData. so the the output in nutch 2.x is less
that in nutch 1.x.


On Wed, Jun 5, 2013 at 5:27 PM, Daniel Hüsch <[email protected]>wrote:

> Hi,
>
> we use Nutch 2.1 and have a question about the "parsechecker".
> We get a very low output, after the command:
> bin/nutch parsechecker http://www.zim.uni-wuppertal.**
> de/dienste/netzzugang/funklan/**microsoft/xp-intelpro-**wireless.html<http://www.zim.uni-wuppertal.de/dienste/netzzugang/funklan/microsoft/xp-intelpro-wireless.html>
>
> This is the output of Nutch 2.1:
> ---------
> Url
> ---------------
> http://www.zim.uni-wuppertal.**de/dienste/netzzugang/funklan/**
> microsoft/xp-intelpro-**wireless.html<http://www.zim.uni-wuppertal.de/dienste/netzzugang/funklan/microsoft/xp-intelpro-wireless.html>
> ---------
> Metadata
> ---------
> language :      de
>
> If we use the parsechecker of Nutch 1.x, we got a bigger output with more
> information!
>
>
> We use this plugins (nutch-site.xml):
>
>   <property>
>     <name>plugin.includes</name>
> <value>protocol-httpclient|**protocol-http|urlfilter-regex|**
> parse-(html|tika)|index-(**basic|anchor|static|more|**
> metadata)|scoring-opic|**urlnormalizer-(pass|regex|**
> basic)|language-identifier|**urlmeta|typo3-(accessrootline|**
> base|index-keywords|parse-**keywords|sitehash|uid)|**headings</value>
>   </property>
>
>
> Why is the output of Nutch 2.1 smaller and can we change it?
> Thank you.
>
> Daniel
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to