I've got the HTML source into a reasonable shape for processing with line and item chunk expressions by using:
put field "fld Page Source Code" into tHTML replace "/div>" with "/div>" & return in tHTML replace "/tr>" with "/tr>" & return in tHTML replace "/td>" with "/td>" & tab in tHTML filter tHTML with <strings that isolate only the interesting, data-laden table rows> So, I can now have line-level chunk expressions mapped to divs and table row tags, together with item-level expressions for iterating through the tags and their attributes within table rows. Nice! Now the rich seams have been revealed, it's time to start digging out them there nuggets! :-) Best, Keith.. On 12 Jun 2011, at 11:56, Keith Clarke wrote: > Thanks for the steer Stephen - I have Remo but hadn't discovered Jerry's > tutorials before. Much to study there. > > The screen-scraping lessons start from the premise that the HTML source is > already reasonably structured into lines - for filtering, etc - so it doesn't > help with my challenge of getting the page source into the state where I can > apply some of these techniques. > > However, it did make me think of experimenting with the replace function - > replace "/*>" with "/*>" & return in tHTML - to soft-wrap the HTML by tag. > > So, I think I'm on the right track now. > Best, > Keith.. _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode