I think making this change is fine.  I think we'd have to ignore the
"malformed" parameters unless someone has a better idea?

On Mon, Sep 13, 2010 at 3:18 AM, Baram, Eliezer <[email protected]> wrote:
> And here is the mail he tried to post
>
> ---------- Forwarded message ----------
> From: Steve Miller <[email protected]<mailto:[email protected]>>
> Date: Mon, Sep 12, 2010 at 2:15 AM
> Subject: Tolerance to malformed media types in Wink client
> To: [email protected]<mailto:[email protected]>
>
> Hi
> I created a crawler using the Apache wink client, but I found out that wink 
> client is not tolerant to malformed media types, even if the malformed part 
> is only a media type parameter. Unfortunately there are a lot of those in the 
> internet.
> When wink receives such media type it throw exception with the message: 
> 'java.lang.IllegalArgumentException ... Verify that the format is like 
> "type/subtype".'
> I think it would be good if wink can be more tolerant for such media types, 
> especially since they are common. It will surly easy my time :-)
>
> Here are examples of the media types that cause the problem and their source. 
> This is a sample, the sites list is longer, but the media type patterns 
> return on themselves.
>
> URL:   http://www.aol.com/   (and all aol sites around the globe)
> Media Type: text/html;;charset=utf-8
>
> URL: http://www.plugrush.com/
> Media Type: text/html; charset: UTF-8
>
> URL: http://www.torrentleech.org/
> Media Type: text/html; charset=
>
> URL: http://www.comingsoon.net/
> Media Type: text/html; $str_charset; charset=ISO-8859-1
>
> URL: http://www.globalsources.com/
> Media Type: text/html; UTF-8;charset=ISO-8859-1
>
> URL: http://dic.academic.ru/
> Media Type: text/html; utf-8
>
> URL: http://www.warnerbros.com/
> Media Type: text/html; UTF-8;charset=UTF-8
>
> Thanks,
> Steve
>
>
>
>
>
>
>
>
>
>

Reply via email to