And here is the mail he tried to post

---------- Forwarded message ----------
From: Steve Miller <[email protected]<mailto:[email protected]>>
Date: Mon, Sep 12, 2010 at 2:15 AM
Subject: Tolerance to malformed media types in Wink client
To: [email protected]<mailto:[email protected]>

Hi
I created a crawler using the Apache wink client, but I found out that wink 
client is not tolerant to malformed media types, even if the malformed part is 
only a media type parameter. Unfortunately there are a lot of those in the 
internet.
When wink receives such media type it throw exception with the message: 
'java.lang.IllegalArgumentException ... Verify that the format is like 
"type/subtype".'
I think it would be good if wink can be more tolerant for such media types, 
especially since they are common. It will surly easy my time :-)

Here are examples of the media types that cause the problem and their source. 
This is a sample, the sites list is longer, but the media type patterns return 
on themselves.

URL:   http://www.aol.com/   (and all aol sites around the globe)
Media Type: text/html;;charset=utf-8

URL: http://www.plugrush.com/
Media Type: text/html; charset: UTF-8

URL: http://www.torrentleech.org/
Media Type: text/html; charset=

URL: http://www.comingsoon.net/
Media Type: text/html; $str_charset; charset=ISO-8859-1

URL: http://www.globalsources.com/
Media Type: text/html; UTF-8;charset=ISO-8859-1

URL: http://dic.academic.ru/
Media Type: text/html; utf-8

URL: http://www.warnerbros.com/
Media Type: text/html; UTF-8;charset=UTF-8

Thanks,
Steve









Reply via email to