Re: [PDX.rb] unwanted multibyte characters from html entities

Chris Anderson Tue, 08 May 2007 16:17:11 -0700

Javan,

You might try using Hpricot to parse the RSS feed. It does a fine job
with all the strange characters I throw at it from HTML. Although
using Hpricot for XML is a (supported) corner case...


http://code.whytheluckystiff.net/hpricot/

At Grabb.it we're using REXML to parse RSS feeds. I haven't really put
it through the paces, but I haven't noticed problems either.

Good luck!

Chris

On 5/8/07, Javan Makhmali <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I'm using Ruby's stdlib RSS library to grab an rss feed and tuck some
> information from it into a database -- essentially creating an
> archive of a feed. The problem I'm having is that some html entities
> (like & l s q u o ;  (without the spaces) for example) in the title
> and description are being mangled with strange multibyte characters
> that I'll avoid pasting into this message. Does anyone know why this
> happens and how I might fix / work around it?
>
> Best,
> Javan
> _______________________________________________
> PDXRuby mailing list
> [email protected]
> IRC: #pdx.rb on irc.freenode.net
> http://lists.pdxruby.org/mailman/listinfo/pdxruby
>
>


-- 
Chris Anderson
http://jchris.mfdz.com
_______________________________________________
PDXRuby mailing list
[email protected]
IRC: #pdx.rb on irc.freenode.net
http://lists.pdxruby.org/mailman/listinfo/pdxruby

Re: [PDX.rb] unwanted multibyte characters from html entities

Reply via email to