On 26 Jul 2006 22:40:09 -0000, [EMAIL PROTECTED]

Wondered whether I should make this NF or not, but seeing how it'll be done from VFP, I 
figured "yeah, it's on topic."

There's been talk recently and in the past about screen-scraping web pages.  Does anyone 
have a "best practice" way of doing this?


As always, it might depend on the situation. When I wrote a crawler in
VFP, I used KenX code for ExplorerX to automate an instance of IE from
VFP. Tell it to load the page; when done, traverse the document using
the Document Object Model to find what you're looking for. It's large,
it's memory- and processor-intensive, and it's got all the downsides
of using ActiveX and IE. But it's the Microsoft Way.

If you knew you were querying well-formed pages (say, those you had
written), you could use XMLDOM and XQuery. Faster and cooler.

Finally, if you don't know what you might be running into, using wget
on the commandline to grab entire pages then parse out the file on
disk avoids all the automation headaches, and would easily let you
store the entire page contents, if that would be an added benefit.


--
Ted Roche
Ted Roche & Associates, LLC
http://www.tedroche.com


_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://leafe.com/mailman/listinfo/profox
OT-free version of this list: http://leafe.com/mailman/listinfo/profoxtech
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

Reply via email to