Additionally we've harmonized the behaviour of 2.x and 1.x so that the next releases will be consistent. We should hopefully be releasing 2.2 very very soon. You can look on the recent archive of this list to find the release candidate artifacts for 2.2 Lewis
On Wednesday, June 5, 2013, feng lu <[email protected]> wrote: > Hi Daniel > > I see that the Nutch 1.x parsechecker command dump the ParseData, but in > nutch 2.x it only dump the MetaData. so the the output in nutch 2.x is less > that in nutch 1.x. > > > On Wed, Jun 5, 2013 at 5:27 PM, Daniel Hüsch <[email protected] >wrote: > >> Hi, >> >> we use Nutch 2.1 and have a question about the "parsechecker". >> We get a very low output, after the command: >> bin/nutch parsechecker http://www.zim.uni-wuppertal.** >> de/dienste/netzzugang/funklan/**microsoft/xp-intelpro-**wireless.html< http://www.zim.uni-wuppertal.de/dienste/netzzugang/funklan/microsoft/xp-intelpro-wireless.html > >> >> This is the output of Nutch 2.1: >> --------- >> Url >> --------------- >> http://www.zim.uni-wuppertal.**de/dienste/netzzugang/funklan/** >> microsoft/xp-intelpro-**wireless.html< http://www.zim.uni-wuppertal.de/dienste/netzzugang/funklan/microsoft/xp-intelpro-wireless.html > >> --------- >> Metadata >> --------- >> language : de >> >> If we use the parsechecker of Nutch 1.x, we got a bigger output with more >> information! >> >> >> We use this plugins (nutch-site.xml): >> >> <property> >> <name>plugin.includes</name> >> <value>protocol-httpclient|**protocol-http|urlfilter-regex|** >> parse-(html|tika)|index-(**basic|anchor|static|more|** >> metadata)|scoring-opic|**urlnormalizer-(pass|regex|** >> basic)|language-identifier|**urlmeta|typo3-(accessrootline|** >> base|index-keywords|parse-**keywords|sitehash|uid)|**headings</value> >> </property> >> >> >> Why is the output of Nutch 2.1 smaller and can we change it? >> Thank you. >> >> Daniel >> > > > > -- > Don't Grow Old, Grow Up... :-) > -- *Lewis*

