Sven said: [snip]
> In consequence of that - any resource, without an appropriate file > extension will be parsed with the "first plugin whose "pathSuffix" is > the empty string". (see comment at > net.nutch.parse.ParserFactory.getParser()) Interesting. I've been meaning to ask about a fair number of errors like fetch okay, but can't parse http://www.tea.state.tx.us/waivers/granted.html, reason: Content-Type not application/msword: When it very rarely has doc extension. Could this be the same thing? In a recent fetch of some 200,000 pages I got this about 3,000 times. Here's some other pages with this malady: http://www.commerce.ubc.ca/ (a redirect) http://www.siue.edu/BUSINESS/econfin/ http://www.nmhu.edu/business/ http://www.juntadeandalucia.es/economiayhacienda/ - Bill -- *------------------------------------------------------* | Bill Goffe [EMAIL PROTECTED] | | Department of Economics voice: (315) 312-3444 | | SUNY Oswego fax: (315) 312-5444 | | 416 Mahar Hall <wuecon.wustl.edu/~goffe> | | Oswego, NY 13126 | *--------*------------------------------------------------------*-----------* | "Only one astronaut has ever taken the terrifying basket ride; a safety | | official who took part in that same trial run screamed all the way down." | | -- A description of riding the emergency slide from the shuttle on its | | launch pad -- it starts 200 feet off the ground and is 1,200 feet | | long. CNN, 4/5/99 | *---------------------------------------------------------------------------* ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
