Hey Oli, Sorry for the delay.
Often this line (with encoding declared) helps: <?xml version="1.0" encoding="UTF-8" ?> My understanding is the feed should be UTF-8 by default, unless specified otherwise. For sites that do not respect this convention (and I know they're out there) my advice would be to keep track of the original Content Type encoding when you initially poll the feed (to discover the hub link), and use that going forward. Hope that helps, -Brett On Sat, Jan 1, 2011 at 1:02 PM, OliB <[email protected]> wrote: > I'm having an issue with feeds that don't declare their content > encoding in the XML processing instruction. > > The content distribution defined in the PSHB spec (7.3) doesn't appear > to allow specification of a character encoding. The Content-Type is > defined as "application/rss+xml" or "application/atom+xml". > > This would not be an issue if the feed XML specified the encoding in > the XML processing instruction however not all feeds do. > > For example, Google Alert Feeds: > > http://www.google.com/alerts/feeds/08979446703162538414/13217883862269731888 > > I simply HTTP GET a Google Alert feed as the HTTPResponse reports > "Content-Type:text/xml; charset=UTF-8". This allows me to decode it > correctly. However as part of a Content Distribution I don't have this > information. > > The HTTP standard http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html > states (in 3.7.1) the default should be ISO-8859-1. So my Java > HttpServletRequest reader appears to be defaulting to this content > type, which means I can't decode the stream correctly. > > Simplest thing would be to get some Googler to fix the Google Alert > XML ;-) > > Or could we consider specifying that HUBs replicate the charset from > the Fetch through into the Content-Distribution? > > OliB
