Thanks for the insights Jim (and Stephen) - all very useful.
A list of stuff is now emerging from the depths of the page. The only problem I 
have now is some stubborn ' ' characters that don't respond to filtering 
without " " or numToChar(160).
Any ideas?
Best,
Keith..

On 12 Jun 2011, at 14:18, Jim Ault wrote:

> I forgot to mention the old frames style if you are looking into archives on 
> old sites,
> and <IFRAME> on newer sites, easy to detect, but now you have a second <head> 
> </head> <body> </body>.
> 
> On Jun 12, 2011, at 4:14 AM, Keith Clarke wrote:
> 
>> I've got the HTML source into a reasonable shape for processing with line 
>> and item chunk expressions by using:
>> 
>> put field "fld Page Source Code" into tHTML
>> replace "/div>" with "/div>" & return in tHTML
>> replace "/tr>" with "/tr>" & return in tHTML
>> replace "/td>" with "/td>" & tab in tHTML
>> filter tHTML with <strings that isolate only the interesting, data-laden 
>> table rows>
>> 
>> So, I can now have line-level chunk expressions mapped to divs and table row 
>> tags, together with item-level expressions for iterating through the tags 
>> and their attributes within table rows. Nice!
>> 
>> Now the rich seams have been revealed, it's time to start digging out them 
>> there nuggets! :-)
> 
> Jim Ault
> Las Vegas


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to