Javan, You might try using Hpricot to parse the RSS feed. It does a fine job with all the strange characters I throw at it from HTML. Although using Hpricot for XML is a (supported) corner case...
http://code.whytheluckystiff.net/hpricot/ At Grabb.it we're using REXML to parse RSS feeds. I haven't really put it through the paces, but I haven't noticed problems either. Good luck! Chris On 5/8/07, Javan Makhmali <[EMAIL PROTECTED]> wrote: > Hi all, > > I'm using Ruby's stdlib RSS library to grab an rss feed and tuck some > information from it into a database -- essentially creating an > archive of a feed. The problem I'm having is that some html entities > (like & l s q u o ; (without the spaces) for example) in the title > and description are being mangled with strange multibyte characters > that I'll avoid pasting into this message. Does anyone know why this > happens and how I might fix / work around it? > > Best, > Javan > _______________________________________________ > PDXRuby mailing list > [email protected] > IRC: #pdx.rb on irc.freenode.net > http://lists.pdxruby.org/mailman/listinfo/pdxruby > > -- Chris Anderson http://jchris.mfdz.com _______________________________________________ PDXRuby mailing list [email protected] IRC: #pdx.rb on irc.freenode.net http://lists.pdxruby.org/mailman/listinfo/pdxruby
