Hi I'm getting alot of the following "errors?" when fetching a segment:
050721 094100 fetch okay, but can't parse http://www.sahunt.co.za/sahunter/recepies/biltongsoup.html, reason: failed(2,203): Content-Type not application/msword: The page above is a pure html page however the fetch is ok but it doesnt get parsed? My plugin includes in my nutch-site are as follows: <property> <name>plugin.includes</name> <value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|msword|pdf|rss|rtf)|index-basic|query-(basic|site|url)</value> <description>Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. By default Nutch includes crawling just HTML and plain text via HTTP, and basic indexing and search plugins. </description> </property> Any ideas? Thanks! _____________________________________________________________________ For super low premiums, click here http://www.dialdirect.co.za/quote
