Jack Tang wrote:
Hi Guys
Did someone install parse-rss and try to fetch rss feeds?
It failed on my side. I enabled the plugin and it fetched, not rss
parser didnot work.
My feed is http://www.craigslist.org/evs/index.rss
Here is the error:
org.apache.nutch.fetcher.Fetcher$FetcherThread [11] - fetch okay, but
can't parse http://beijing.craigslist.org/jjj/index.rss, reason:
failed(2,203): Content-Type not text/html: application/xml;
charset=ISO-8859-1
Hmmph... But the message says "text/html", did you notice? I bet it
comes from the HTML Parser, which is the last parser in the chain in the
default config (because it doesn't filter by pathSuffix).
But other than that, your analysis is correct, probably there should be
an "application/xml" added to the list of handled content types. But
this is further complicated by the fact, that Nutch doesn't do the right
thing now if you have more than one plugin handling the same mime type...
The content-type is application/xml. Mattmann's comment is this:
// check that contentType is one we can handle
String contentType = content.getContentType();
if (contentType != null
&& (!contentType.startsWith("text/xml") &&
!contentType.startsWith("application/rss+xml")))
return new ParseStatus(ParseStatus.FAILED_INVALID_FORMAT,
"Content-Type not text/xml or application/rss+xml: "
+ contentType).getEmptyParse();
So, it does not "application/xml" content type yet?
Thanks
/Jack
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers