Following are output from the fetcher and headers from the firefoxweb developer toolbar.

I'd appreciate any thoughts. Perhaps something for parser policy. I've traced the source code a bit and nothing jumped out at me...

-j

--

050923 020413 fetch okay, but can't parse http://medicalcenter.osu.edu/pdfs/PatientEd/Materials/PDFDocs/procedure/handwsh.pdf, reason: failed(2,0): No external command defined for contentType:

Response Headers - http://medicalcenter.osu.edu/pdfs/PatientEd/Materials/PDFDocs/procedure/handwsh.pdf

Server: Microsoft-IIS/5.0
X-Powered-By: ASP.NET
Date: Fri, 23 Sep 2005 17:14:19 GMT
Content-Type: application/pdf
Accept-Ranges: bytes
Last-Modified: Mon, 21 Jun 2004 16:10:22 GMT
Etag: "02b341aa57c41:96b"
Content-Length: 85604

200 OK


050923 020507 fetch okay, but can't parse http://vet.osu.edu/sa/atcenter/vm522/webweek2/bovhd9.html, reason: failed(2,0): No external command defined for contentType:

Response Headers - http://vet.osu.edu/sa/atcenter/vm522/webweek2/bovhd9.html

Date: Fri, 23 Sep 2005 17:20:57 GMT
Server: Apache/1.3.33 (Darwin) PHP/4.3.11
Cache-Control: max-age=60
Expires: Fri, 23 Sep 2005 17:21:57 GMT
Last-Modified: Fri, 15 Apr 2005 15:49:06 GMT
Etag: "31dd9-1c0-425fe272"
Accept-Ranges: bytes
Content-Length: 448
Connection: close
Content-Type: text/html

200 OK


050923 021427 fetch okay, but can't parse http://felix.us.ohio-state.edu/search/o?SEARCH=21305366, reason: failed(2,0): No external command defined for contentType:

Response Headers - http://felix.us.ohio-state.edu/search/o?SEARCH=1755564

Server: III 100
Pragma: no-cache
Expires: 0
Date: Fri Sep 23 17:25:05 2005 GMT
MIME-version: 1.0
Set-Cookie: SESSION_ID=1127496305.29650; path=/
Content-Type: text/html; charset=UTF-8

200 OK





Vanderdray, Jake wrote:
        What's the URL?  I think someone else had a similar problem and
it turned out to that the URL produced a redirect to URL containing a
query string.  Since Nutch was configured not to fetch URLs with query
strings, it just failed.

Jake.

-----Original Message-----
From: Jon Shoberg [mailto:[EMAIL PROTECTED] Sent: Friday, September 23, 2005 12:27 PM
To: [email protected]
Subject: No external command defined for contentType: Anyone else get the message "No external command defined for contentType:" without any sort of MIME content type declaration?

I can see HTML, PDF, and other documents getting fetched but failing on the parse with the above message. When I go directly to the server and manually get the document I see a valid MIME header for content type returned in the HTTP response header.

Anyone else seen this?  I'm fetching content but not parsing it
reliably.
-j



Reply via email to