Nathan Lane wrote: > I want to make what in effect is a website scraper using PHP, but it isn't > obvious how this would best be done. I've tried using DOMDocument and I'm > not sure if that's the best option or not. I'd really like to use something > where I could use XPath to get the elements out that I want. Recently I > wrote a similar program in C# that I call HttpAnalyzer. Could I just use > that with PHP (i.e. call it from PHP) to get what I'm looking for? Any > suggestions?
i would agree with alvaro and walt. You could actually combine the 2 suggestions...I have done the following: 1. download the page 2. run the page through tidy (cleanup tags) 3. applied xslt transform with dom 4. retrieve the results This has worked really well in terms of speed and the amount of data that I have used. xslt can contain logic which is really nice. by using xslt i can create various transformation providing greater flexibility and customization and i can still use all the xml technologies like xpath. -- thebigdog _______________________________________________ UPHPU mailing list [email protected] http://uphpu.org/mailman/listinfo/uphpu IRC: #uphpu on irc.freenode.net
