On Wed, Jan 12, 2011 at 10:55 PM, Rob Marscher <rmarsc...@beaffinitive.com> wrote: >> On Wed, Jan 12, 2011 at 9:30 AM, Jim Yi <j...@jimyi.com> wrote: >>> This problem is much better suited for an XML parser > On Jan 12, 2011, at 9:39 AM, Randal Rust wrote: >> I will have to try this out, because I am not sure that the approach I >> was taking will work. > > I seem to remember having problems when I was using the DomDocument on rss > feeds that were submitted by users and not under my control. Many of them > were not well-formed and that caused DomDocument to not work. Getting the > creator of the rss feed to fix it wasn't an option, but I did have some luck > enabling the "recovery" mode of libxml which is available in > DomDocument->recovery property - > http://www.php.net/manual/en/class.domdocument.php#domdocument.props.recover > > Also, if the content is not encoded in utf8, you might need to run > utf8_decode on the strings to get the right data because I believe libxml > uses utf8 internally.
If you are using DomDocument, passing everything though Tidy first is a good idea. http://us2.php.net/tidy http://us2.php.net/manual/en/tidy.examples.basic.php John Campbell _______________________________________________ New York PHP Users Group Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk http://www.nyphp.org/Show-Participation