Re: [nyphp-talk] Regex for P Elements

John Campbell Wed, 12 Jan 2011 23:10:10 -0800

On Wed, Jan 12, 2011 at 10:55 PM, Rob Marscher
<[email protected]> wrote:
>> On Wed, Jan 12, 2011 at 9:30 AM, Jim Yi <[email protected]> wrote:
>>> This problem is much better suited for an XML parser
> On Jan 12, 2011, at 9:39 AM, Randal Rust wrote:
>> I will have to try this out, because I am not sure that the approach I
>> was taking will work.
>
> I seem to remember having problems when I was using the DomDocument on rss 
> feeds that were submitted by users and not under my control.  Many of them 
> were not well-formed and that caused DomDocument to not work.  Getting the 
> creator of the rss feed to fix it wasn't an option, but I did have some luck 
> enabling the "recovery" mode of libxml which is available in 
> DomDocument->recovery property - 
> http://www.php.net/manual/en/class.domdocument.php#domdocument.props.recover
>
> Also, if the content is not encoded in utf8, you might need to run 
> utf8_decode on the strings to get the right data because I believe libxml 
> uses utf8 internally.


If you are using DomDocument, passing everything though Tidy first is
a good idea.
http://us2.php.net/tidy
http://us2.php.net/manual/en/tidy.examples.basic.php

John Campbell
_______________________________________________
New York PHP Users Group Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

http://www.nyphp.org/Show-Participation

Re: [nyphp-talk] Regex for P Elements

Reply via email to