> Interesting. > > I've been meaning to ask about a fair number of errors like > fetch okay, but can't parse > http://www.tea.state.tx.us/waivers/granted.html, > reason: Content-Type not application/msword: > > When it very rarely has doc extension. Could this be the same > thing? In a recent fetch of some 200,000 pages I got this > about 3,000 times.
Could you send some more details from your logs? I need to know the exact URL for a page, which produces this error. Maybe it is the same error - although the error I decribed, is the other way round. I had URL�s like www.foo.com/foo for real PDFs files or real Images and Nutch parses them with the HTML-Parser because it didn�t get the correct content-type! ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
