Following are output from the fetcher and headers from the firefoxweb
developer toolbar.
I'd appreciate any thoughts. Perhaps something for parser policy. I've
traced the source code a bit and nothing jumped out at me...
-j
--
050923 020413 fetch okay, but can't parse
http://medicalcenter.osu.edu/pdfs/PatientEd/Materials/PDFDocs/procedure/handwsh.pdf,
reason: failed(2,0): No external command defined for contentType:
Response Headers -
http://medicalcenter.osu.edu/pdfs/PatientEd/Materials/PDFDocs/procedure/handwsh.pdf
Server: Microsoft-IIS/5.0
X-Powered-By: ASP.NET
Date: Fri, 23 Sep 2005 17:14:19 GMT
Content-Type: application/pdf
Accept-Ranges: bytes
Last-Modified: Mon, 21 Jun 2004 16:10:22 GMT
Etag: "02b341aa57c41:96b"
Content-Length: 85604
200 OK
050923 020507 fetch okay, but can't parse
http://vet.osu.edu/sa/atcenter/vm522/webweek2/bovhd9.html, reason:
failed(2,0): No external command defined for contentType:
Response Headers - http://vet.osu.edu/sa/atcenter/vm522/webweek2/bovhd9.html
Date: Fri, 23 Sep 2005 17:20:57 GMT
Server: Apache/1.3.33 (Darwin) PHP/4.3.11
Cache-Control: max-age=60
Expires: Fri, 23 Sep 2005 17:21:57 GMT
Last-Modified: Fri, 15 Apr 2005 15:49:06 GMT
Etag: "31dd9-1c0-425fe272"
Accept-Ranges: bytes
Content-Length: 448
Connection: close
Content-Type: text/html
200 OK
050923 021427 fetch okay, but can't parse
http://felix.us.ohio-state.edu/search/o?SEARCH=21305366, reason:
failed(2,0): No external command defined for contentType:
Response Headers - http://felix.us.ohio-state.edu/search/o?SEARCH=1755564
Server: III 100
Pragma: no-cache
Expires: 0
Date: Fri Sep 23 17:25:05 2005 GMT
MIME-version: 1.0
Set-Cookie: SESSION_ID=1127496305.29650; path=/
Content-Type: text/html; charset=UTF-8
200 OK
Vanderdray, Jake wrote:
What's the URL? I think someone else had a similar problem and
it turned out to that the URL produced a redirect to URL containing a
query string. Since Nutch was configured not to fetch URLs with query
strings, it just failed.
Jake.
-----Original Message-----
From: Jon Shoberg [mailto:[EMAIL PROTECTED]
Sent: Friday, September 23, 2005 12:27 PM
To: [email protected]
Subject: No external command defined for contentType:
Anyone else get the message "No external command defined for
contentType:" without any sort of MIME content type declaration?
I can see HTML, PDF, and other documents getting fetched but failing on
the parse with the above message. When I go directly to the server and
manually get the document I see a valid MIME header for content type
returned in the HTTP response header.
Anyone else seen this? I'm fetching content but not parsing it
reliably.
-j