And here is the mail he tried to post ---------- Forwarded message ---------- From: Steve Miller <[email protected]<mailto:[email protected]>> Date: Mon, Sep 12, 2010 at 2:15 AM Subject: Tolerance to malformed media types in Wink client To: [email protected]<mailto:[email protected]>
Hi I created a crawler using the Apache wink client, but I found out that wink client is not tolerant to malformed media types, even if the malformed part is only a media type parameter. Unfortunately there are a lot of those in the internet. When wink receives such media type it throw exception with the message: 'java.lang.IllegalArgumentException ... Verify that the format is like "type/subtype".' I think it would be good if wink can be more tolerant for such media types, especially since they are common. It will surly easy my time :-) Here are examples of the media types that cause the problem and their source. This is a sample, the sites list is longer, but the media type patterns return on themselves. URL: http://www.aol.com/ (and all aol sites around the globe) Media Type: text/html;;charset=utf-8 URL: http://www.plugrush.com/ Media Type: text/html; charset: UTF-8 URL: http://www.torrentleech.org/ Media Type: text/html; charset= URL: http://www.comingsoon.net/ Media Type: text/html; $str_charset; charset=ISO-8859-1 URL: http://www.globalsources.com/ Media Type: text/html; UTF-8;charset=ISO-8859-1 URL: http://dic.academic.ru/ Media Type: text/html; utf-8 URL: http://www.warnerbros.com/ Media Type: text/html; UTF-8;charset=UTF-8 Thanks, Steve
