Yep, I can't think of another way to make that work without it being overly 
complicated.  


For most of the examples below, it looks like just adding logic that ignores 
the 
param when there's a key without a value would do the trick.



----- Original Message ----
From: Bryant Luk <[email protected]>
To: [email protected]
Sent: Mon, September 13, 2010 9:43:17 AM
Subject: Re: FW: Tolerance to malformed media types in Wink client

I think making this change is fine.  I think we'd have to ignore the
"malformed" parameters unless someone has a better idea?

On Mon, Sep 13, 2010 at 3:18 AM, Baram, Eliezer <[email protected]> wrote:
> And here is the mail he tried to post
>
> ---------- Forwarded message ----------
> From: Steve Miller <[email protected]<mailto:[email protected]>>
> Date: Mon, Sep 12, 2010 at 2:15 AM
> Subject: Tolerance to malformed media types in Wink client
> To: [email protected]<mailto:[email protected]>
>
> Hi
> I created a crawler using the Apache wink client, but I found out that wink 
>client is not tolerant to malformed media types, even if the malformed part is 
>only a media type parameter. Unfortunately there are a lot of those in the 
>internet.
> When wink receives such media type it throw exception with the message: 
>'java.lang.IllegalArgumentException ... Verify that the format is like 
>"type/subtype".'
> I think it would be good if wink can be more tolerant for such media types, 
>especially since they are common. It will surly easy my time :-)
>
> Here are examples of the media types that cause the problem and their source. 
>This is a sample, the sites list is longer, but the media type patterns return 
>on themselves.
>
> URL:   http://www.aol.com/   (and all aol sites around the globe)
> Media Type: text/html;;charset=utf-8
>
> URL: http://www.plugrush.com/
> Media Type: text/html; charset: UTF-8
>
> URL: http://www.torrentleech.org/
> Media Type: text/html; charset=
>
> URL: http://www.comingsoon.net/
> Media Type: text/html; $str_charset; charset=ISO-8859-1
>
> URL: http://www.globalsources.com/
> Media Type: text/html; UTF-8;charset=ISO-8859-1
>
> URL: http://dic.academic.ru/
> Media Type: text/html; utf-8
>
> URL: http://www.warnerbros.com/
> Media Type: text/html; UTF-8;charset=UTF-8
>
> Thanks,
> Steve
>
>
>
>
>
>
>
>
>
>



      

Reply via email to