Sven said:

[snip]

> In consequence of that - any resource, without an appropriate file
> extension will be parsed with the "first plugin whose "pathSuffix" is
> the empty string". (see comment at
> net.nutch.parse.ParserFactory.getParser())

Interesting.

I've been meaning to ask about a fair number of errors like
  fetch okay, but can't parse
  http://www.tea.state.tx.us/waivers/granted.html, 
  reason: Content-Type not application/msword:

When it very rarely has doc extension. Could this be the same thing? In a
recent fetch of some 200,000 pages I got this about 3,000 times.

Here's some other pages with this malady:
http://www.commerce.ubc.ca/  (a redirect)
http://www.siue.edu/BUSINESS/econfin/
http://www.nmhu.edu/business/
http://www.juntadeandalucia.es/economiayhacienda/

         - Bill

-- 
         *------------------------------------------------------*
         | Bill Goffe                 [EMAIL PROTECTED]          |
         | Department of Economics    voice: (315) 312-3444     |
         | SUNY Oswego                fax:   (315) 312-5444     |
         | 416 Mahar Hall             <wuecon.wustl.edu/~goffe> |          
         | Oswego, NY  13126                                    |
*--------*------------------------------------------------------*-----------*
| "Only one astronaut has ever taken the terrifying basket ride; a safety   |
| official who took part in that same trial run screamed all the way down." |
|   -- A description of riding the emergency slide from the shuttle on its  |
|      launch pad -- it starts 200 feet off the ground and is 1,200 feet    |
|      long. CNN, 4/5/99                                                    |
*---------------------------------------------------------------------------*



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to