> Interesting.
> 
> I've been meaning to ask about a fair number of errors like
>   fetch okay, but can't parse
>   http://www.tea.state.tx.us/waivers/granted.html,
>   reason: Content-Type not application/msword:
> 
> When it very rarely has doc extension. Could this be the same 
> thing? In a recent fetch of some 200,000 pages I got this 
> about 3,000 times.

Could you send some more details from your logs? I need to know
the exact URL for a page, which produces this error.

Maybe it is the same error - although the error I decribed, is the
other way round. I had URL�s like www.foo.com/foo for real PDFs files or
real Images
and Nutch parses them with the HTML-Parser because it didn�t get
the correct content-type!





-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to