nutch version 0.7.1
site xml is
...<property>
  <name>plugin.includes</name>
 
<value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html|js|pdf)|index-basic|query-(basic|site|url)|language-identifier</value>
  <description>Regular expression naming plugin directory names to
  include.  Any plugin not matching this expression is excluded.
  In any case you need at least include the nutch-extensionpoints plugin. By
  default Nutch includes crawling just HTML and plain text via HTTP,
  and basic indexing and search plugins.
  </description>
</property>
...
as you can see, parse-html is included.
i did not made any changes in parse-plugins.xml
i cannot even find it.
what's the exactly name? parse-plugins.xml???
where does it have to be?


> --- Ursprüngliche Nachricht ---
> Von: "Jérôme Charron" <[EMAIL PROTECTED]>
> An: [email protected]
> Betreff: Re: try to parse pdf
> Datum: Tue, 14 Mar 2006 11:10:58 +0100
> 
> > Result is always the same:
> > it still says
> > fetch okay, but can't parse
> > http://www.uni-koeln.de/uni/map.html, reason: failed(2,203):
> > Content-Type not application/pdf:
> >
> > another idea what's going wrong??
> 
> Which version of nutch do you use?
> Does the parse-html plugin is activated?
> Do you made some changes in parse-plugins.xml ?
> 
> Jérôme
> 
> --
> http://motrech.free.fr/
> http://www.frutch.org/
> 

-- 
Bis zu 70% Ihrer Onlinekosten sparen: GMX SmartSurfer!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to