> Wondered whether I should make this NF or not, but seeing how > it'll be done from VFP, I figured "yeah, it's on topic." > > There's been talk recently and in the past about > screen-scraping web pages. Does anyone have a "best > practice" way of doing this?
I've wrestled with this one, to pull a list of RV sites in the US into VFP. At first it looked like a cakewalk, because name, address, tel, contact info was arranged vertically on the (long) page separated by blank lines. I think it was provided as one state per (long) page. What I did was, a page/state at a time, copy the page to the clipboard and then run the VFP process that read the clipboard and parsed it's contents into individual records. After about 100 problems, I finally got it to work, more or less, because the data on these pages is not necessarily structured in the way it appears to be. In the cases I ran with, sometimes there would be one blank line separator, sometimes multiple blank lines. Fine, fix that, then discover the data has never been validated, so it's incomplete and contains missing/transposed fields, etc. basically required manual cleaning afterwards. And this was a case where the data appeared to be structured and amenable to screen scraping. I suppose the flip side is where data IS properly structured and formatted/validated, perhaps by organizations intended to distribute data this way, but then you'd think they would support other ways to get it then screen scraping. Bill > tia, > --Michael > _______________________________________________ Post Messages to: [email protected] Subscription Maintenance: http://leafe.com/mailman/listinfo/profox OT-free version of this list: http://leafe.com/mailman/listinfo/profoxtech ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious.

