I think making this change is fine. I think we'd have to ignore the "malformed" parameters unless someone has a better idea?
On Mon, Sep 13, 2010 at 3:18 AM, Baram, Eliezer <[email protected]> wrote: > And here is the mail he tried to post > > ---------- Forwarded message ---------- > From: Steve Miller <[email protected]<mailto:[email protected]>> > Date: Mon, Sep 12, 2010 at 2:15 AM > Subject: Tolerance to malformed media types in Wink client > To: [email protected]<mailto:[email protected]> > > Hi > I created a crawler using the Apache wink client, but I found out that wink > client is not tolerant to malformed media types, even if the malformed part > is only a media type parameter. Unfortunately there are a lot of those in the > internet. > When wink receives such media type it throw exception with the message: > 'java.lang.IllegalArgumentException ... Verify that the format is like > "type/subtype".' > I think it would be good if wink can be more tolerant for such media types, > especially since they are common. It will surly easy my time :-) > > Here are examples of the media types that cause the problem and their source. > This is a sample, the sites list is longer, but the media type patterns > return on themselves. > > URL: http://www.aol.com/ (and all aol sites around the globe) > Media Type: text/html;;charset=utf-8 > > URL: http://www.plugrush.com/ > Media Type: text/html; charset: UTF-8 > > URL: http://www.torrentleech.org/ > Media Type: text/html; charset= > > URL: http://www.comingsoon.net/ > Media Type: text/html; $str_charset; charset=ISO-8859-1 > > URL: http://www.globalsources.com/ > Media Type: text/html; UTF-8;charset=ISO-8859-1 > > URL: http://dic.academic.ru/ > Media Type: text/html; utf-8 > > URL: http://www.warnerbros.com/ > Media Type: text/html; UTF-8;charset=UTF-8 > > Thanks, > Steve > > > > > > > > > >
