Chirag said:

> Can you send me the link to a page that has this problem -- I'll run some
> tests to see what's causing this.

Unless I'm reading this thread incorrectly, the following sites share 
this malady:
  fetch okay, but can't parse
  http://www.tea.state.tx.us/waivers/granted.html,
  reason: Content-Type not application/msword:

When it very rarely has doc extension. In a fetch of some 200,000 pages 
I got this about 3,000 times.

Here's some other pages with this malady:
http://www.commerce.ubc.ca/  (a redirect)
http://www.siue.edu/BUSINESS/econfin/
http://www.nmhu.edu/business/
http://www.juntadeandalucia.es/economiayhacienda/

         - Bill

-- 
         *------------------------------------------------------*
         | Bill Goffe                 [EMAIL PROTECTED]          |
         | Department of Economics    voice: (315) 312-3444     |
         | SUNY Oswego                fax:   (315) 312-5444     |
         | 416 Mahar Hall             <wuecon.wustl.edu/~goffe> |          
         | Oswego, NY  13126                                    |
*--------*------------------------------------------------------*-----------*
| "Close to half of the teachers report spending `a great deal' of time     |
| preparing their students in test-taking skills."                          |
|  -- "Survey: Educators worry about 'teaching to test,'" January 10, 2001  |
|     <http://www.cnn.com/2001/US/01/10/standardized.test/index.html>       |
*---------------------------------------------------------------------------*



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to