Hi

I'm getting alot of the following "errors?" when fetching a
segment:

050721 094100 fetch okay, but can't parse
http://www.sahunt.co.za/sahunter/recepies/biltongsoup.html,
reason: failed(2,203): Content-Type not application/msword:


The page above is a pure html page however the fetch is ok
but it doesnt get parsed? 

My plugin includes in my nutch-site are as follows:

<property>
  <name>plugin.includes</name>
  
<value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|msword|pdf|rss|rtf)|index-basic|query-(basic|site|url)</value>
  <description>Regular expression naming plugin directory
names to
  include.  Any plugin not matching this expression is
excluded.  By
  default Nutch includes crawling just HTML and plain text
via HTTP,
  and basic indexing and search plugins.
  </description>
</property>


Any ideas?
Thanks!
_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to