nutch version 0.7.1 site xml is ...<property> <name>plugin.includes</name> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html|js|pdf)|index-basic|query-(basic|site|url)|language-identifier</value> <description>Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. In any case you need at least include the nutch-extensionpoints plugin. By default Nutch includes crawling just HTML and plain text via HTTP, and basic indexing and search plugins. </description> </property> ... as you can see, parse-html is included. i did not made any changes in parse-plugins.xml i cannot even find it. what's the exactly name? parse-plugins.xml??? where does it have to be?
> --- Ursprüngliche Nachricht --- > Von: "Jérôme Charron" <[EMAIL PROTECTED]> > An: [email protected] > Betreff: Re: try to parse pdf > Datum: Tue, 14 Mar 2006 11:10:58 +0100 > > > Result is always the same: > > it still says > > fetch okay, but can't parse > > http://www.uni-koeln.de/uni/map.html, reason: failed(2,203): > > Content-Type not application/pdf: > > > > another idea what's going wrong?? > > Which version of nutch do you use? > Does the parse-html plugin is activated? > Do you made some changes in parse-plugins.xml ? > > Jérôme > > -- > http://motrech.free.fr/ > http://www.frutch.org/ > -- Bis zu 70% Ihrer Onlinekosten sparen: GMX SmartSurfer! Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
