I forgot one thing: Scriptable Browser. http://www.lastcraft.com/browser_documentation.php
This makes it really easy to deal with forms, authentication, clicking on links, etc. Seriously, the combination of scriptable browser, tidy, and xpath makes scraping a piece of cake. Alvaro Alvaro Carrasco wrote: > In my experience, the easiest way is: run website through tidy, load it > into a DOMDocument, and use xpath. > > The xpath patterns are SO much easier to read and write than regex and > more resistant to changes to the website (if you write them correctly). > You can also use regex within xpath if you ever need it. > > Alvaro > > Nathan Lane wrote: >> I want to make what in effect is a website scraper using PHP, but it isn't >> obvious how this would best be done. I've tried using DOMDocument and I'm >> not sure if that's the best option or not. I'd really like to use something >> where I could use XPath to get the elements out that I want. Recently I >> wrote a similar program in C# that I call HttpAnalyzer. Could I just use >> that with PHP (i.e. call it from PHP) to get what I'm looking for? Any >> suggestions? _______________________________________________ UPHPU mailing list [email protected] http://uphpu.org/mailman/listinfo/uphpu IRC: #uphpu on irc.freenode.net
