Hi folks,
Local rainy Saturday night broadband load prevented me from seeing the whole of
Colin Holgate's fascinating LiveCode Live presentation on working with web page
source HTML text - so I can't wait for the recording!
Meanwhile, I'm trying to extract various html tags and specific
Jerry Daniels has an excellent series on screen scraping. Several video
lessons.
http://revmentor.com/business-logic-screen-scraping-1
On 12 June 2011 02:27, Keith Clarke keith.cla...@clarkeandclarke.co.ukwrote:
Hi folks,
Local rainy Saturday night broadband load prevented me from seeing the
Thanks for the steer Stephen - I have Remo but hadn't discovered Jerry's
tutorials before. Much to study there.
The screen-scraping lessons start from the premise that the HTML source is
already reasonably structured into lines - for filtering, etc - so it doesn't
help with my challenge of
I've got the HTML source into a reasonable shape for processing with line and
item chunk expressions by using:
put field fld Page Source Code into tHTML
replace /div with /div return in tHTML
replace /tr with /tr return in tHTML
replace /td with /td tab in tHTML
filter tHTML with strings that
I forgot to mention the old frames style if you are looking into
archives on old sites,
and IFRAME on newer sites, easy to detect, but now you have a second
head /head body /body.
On Jun 12, 2011, at 4:14 AM, Keith Clarke wrote:
I've got the HTML source into a reasonable shape for
Okay, 4 more that were not on the previous list
quot; amp; lt;gt;
On Jun 12, 2011, at 4:14 AM, Keith Clarke wrote:
I've got the HTML source into a reasonable shape for processing with
line and item chunk expressions by using:
Jim Ault
Las Vegas
Thanks for the insights Jim (and Stephen) - all very useful.
A list of stuff is now emerging from the depths of the page. The only problem I
have now is some stubborn 'nbsp;' characters that don't respond to filtering
without nbsp; or numToChar(160).
Any ideas?
Best,
Keith..
On 12 Jun 2011, at
nbsp means a nonbreaking space. most html renderer remove double spaces, for
historical reasons as far as i know. thus the nbsp was introduced, and can
appear anywhere in a text, most often to do basic indentation. however, filter
only works on full lines, and is thus not helpful with that. you
On Jun 12, 2011, at 6:42 AM, Keith Clarke wrote:
Thanks for the insights Jim (and Stephen) - all very useful.
A list of stuff is now emerging from the depths of the page. The
only problem I have now is some stubborn 'nbsp;' characters that
don't respond to filtering without nbsp; or
Whoops - sorry to make you repeat yourself Jim ( thanks Björnke for the
reminder why I need to 'replace', as 'filter' only works at line level)
Still, sorted - I now have a list of stuff out of one page to take along to the
next hurdle! ;-)
Best,
Keith..
On 12 Jun 2011, at 14:58, Jim Ault
Not sure how my other reply ended up on the wrong thread! Take two...
Here is a different approach:
If your source is XHTML you can treat it as XML, and go through nodes that way.
If it's not XHTML, look at this tool:
http://www.ibm.com/developerworks/xml/library/x-tiptidy/index.html
You
Thanks Colin. Actually, I think this is the same Tidy project as I reached in
my recent thread about my quest to get TextWrangler to soft-wrap HTML text
files by tags and attributes (still can't get the syntax quite right).
I hadn't thought of trying to access it direct from LiveCode though.
Anything in particular? My thing was really all about how to do things as a
beginner, there are fancier more obscure ways to do the same things. Not
necessarily better, but sometimes more powerful.
On Jun 12, 2011, at 1:44 PM, Keith Clarke wrote:
BTW nice presentation last night and really
I am a LiveCode novice (1 year, so still a Rookie!). So, part of the challenge
with LiveCode (and indeed, software development in general for me) is
understanding the art of the possible.
These short, worked example video demonstrations - with a low level of
abstraction from real world
On Jun 12, 2011, at 1:15 PM, Keith Clarke wrote:
I am a LiveCode novice (1 year, so still a Rookie!). So, part of
the challenge with LiveCode (and indeed, software development in
general for me) is understanding the art of the possible.
If you have a URL, I could give some concrete
On Jun 12, 2011, at 5:24 PM, Jim Ault wrote:
On Jun 12, 2011, at 1:15 PM, Keith Clarke wrote:
I am a LiveCode novice (1 year, so still a Rookie!). So, part of the
challenge with LiveCode (and indeed, software development in general for me)
is understanding the art of the possible.
If
/*** Formatting HTML code ***
indentHtml v.0.1.0, 13 June 2011
Adjust kTags to define tags that need indentation.
Parameters:
theHtml: any valid HTML source code;
theTabSpaces: the number of positions occupied by one tab character
theSoftWrapCol: the column (=position number) after which lines are
17 matches
Mail list logo