I've got the HTML source into a reasonable shape for processing with line and 
item chunk expressions by using:

put field "fld Page Source Code" into tHTML
replace "/div>" with "/div>" & return in tHTML
replace "/tr>" with "/tr>" & return in tHTML
replace "/td>" with "/td>" & tab in tHTML
filter tHTML with <strings that isolate only the interesting, data-laden table 
rows>

So, I can now have line-level chunk expressions mapped to divs and table row 
tags, together with item-level expressions for iterating through the tags and 
their attributes within table rows. Nice!

Now the rich seams have been revealed, it's time to start digging out them 
there nuggets! :-) 
Best,
Keith..

On 12 Jun 2011, at 11:56, Keith Clarke wrote:

> Thanks for the steer Stephen - I have Remo but hadn't discovered Jerry's 
> tutorials before. Much to study there.
> 
> The screen-scraping lessons start from the premise that the HTML source is 
> already reasonably structured into lines - for filtering, etc - so it doesn't 
> help with my challenge of getting the page source into the state where I can 
> apply some of these techniques.
> 
> However, it did make me think of experimenting with the replace function - 
> replace "/*>" with "/*>" & return in tHTML - to soft-wrap the HTML by tag. 
> 
> So, I think I'm on the right track now.
> Best,
> Keith..


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to