Re: How to extract an entire element from an HTML file?

2018-11-26 Thread J. Landman Gay via use-livecode

On 11/26/18 1:46 PM, Tom Glod via use-livecode wrote:

I've been thinking about a simple html parser as well to extract email
addresses or urls from a page.

Tools that might help

1. regular expressions

> ...

I've posted this link before but it is worth reading more than once. :)


--
Jacqueline Landman Gay | jac...@hyperactivesw.com
HyperActive Software   | http://www.hyperactivesw.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to extract an entire element from an HTML file?

2018-11-26 Thread Tom Glod via use-livecode
I've been thinking about a simple html parser as well to extract email
addresses or urls from a page.

Tools that might help

1. regular expressions
2. item delimiter and chunks. (set itemdelimiter to tag you are trying to
extract)
3.replace command

Good luck.


On Mon, Nov 26, 2018 at 10:18 AM Keith Clarke via use-livecode <
use-livecode@lists.runrev.com> wrote:

> Thanks for the warning and the link to the parsers, Trevor.
>
> I get the point regarding unclean HTML - as I won’t be in control of the
> source. Following a cursory glance through the dictionary, I’m also a tad
> concerned about the variability in HTML tag content (e.g.
>
> content & elements
> vs.
> content & elements
>
> ...and hence, how much wrangling might be needed to identify all the nodes
> in the tree with a specific class, where jQuery’s "$j(‘.red’).html();”
> saves a lot of the heavy lifting involved.
>
> I’ll have a look at those parsers, too - though I doubt my coding chops
> are up to creating a library wrapper - indeed, I’ll have to Google what one
> is! :-)
> Best,
> Keith
>
> > On 26 Nov 2018, at 13:42, Trevor DeVore via use-livecode <
> use-livecode@lists.runrev.com> wrote:
> >
> > On Mon, Nov 26, 2018 at 3:30 AM Keith Clarke via use-livecode <
> > use-livecode@lists.runrev.com> wrote:
> >
> >> Thanks for the steer, Paul - I’ve not worked with XML in LiveCode so
> >> hadn’t made the connection between the HTML markup structure & XML.
> >
> >
> > Keith,
> >
> > I’ve used revXML for parsing HTML in somewhat controlled conditions.
> While
> > revXML can work for HTML, your results will vary based on how well
> > structured the HTML is. If there are tags that are not closed or are out
> of
> > balance then revXML won’t give you the results you expect. If you are
> > generating the HTML then it shouldn’t be a problem. If it is third party
> > HTML then you may have to massage the HTML input to get it to work.
> >
> > It would be great if there were a library wrapper around one of the
> > dedicated HTML parsers listed on this page:
> >
> > https://en.m.wikipedia.org/wiki/Comparison_of_HTML_parsers
> >
> > --
> > Trevor DeVore
> > ScreenSteps
> > ___
> > use-livecode mailing list
> > use-livecode@lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to extract an entire element from an HTML file?

2018-11-26 Thread Keith Clarke via use-livecode
Thanks for the warning and the link to the parsers, Trevor. 

I get the point regarding unclean HTML - as I won’t be in control of the 
source. Following a cursory glance through the dictionary, I’m also a tad 
concerned about the variability in HTML tag content (e.g. 

content & elements
vs.
content & elements

...and hence, how much wrangling might be needed to identify all the nodes in 
the tree with a specific class, where jQuery’s "$j(‘.red’).html();” saves a lot 
of the heavy lifting involved.

I’ll have a look at those parsers, too - though I doubt my coding chops are up 
to creating a library wrapper - indeed, I’ll have to Google what one is! :-)
Best,
Keith

> On 26 Nov 2018, at 13:42, Trevor DeVore via use-livecode 
>  wrote:
> 
> On Mon, Nov 26, 2018 at 3:30 AM Keith Clarke via use-livecode <
> use-livecode@lists.runrev.com> wrote:
> 
>> Thanks for the steer, Paul - I’ve not worked with XML in LiveCode so
>> hadn’t made the connection between the HTML markup structure & XML.
> 
> 
> Keith,
> 
> I’ve used revXML for parsing HTML in somewhat controlled conditions. While
> revXML can work for HTML, your results will vary based on how well
> structured the HTML is. If there are tags that are not closed or are out of
> balance then revXML won’t give you the results you expect. If you are
> generating the HTML then it shouldn’t be a problem. If it is third party
> HTML then you may have to massage the HTML input to get it to work.
> 
> It would be great if there were a library wrapper around one of the
> dedicated HTML parsers listed on this page:
> 
> https://en.m.wikipedia.org/wiki/Comparison_of_HTML_parsers
> 
> -- 
> Trevor DeVore
> ScreenSteps
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to extract an entire element from an HTML file?

2018-11-26 Thread Trevor DeVore via use-livecode
On Mon, Nov 26, 2018 at 3:30 AM Keith Clarke via use-livecode <
use-livecode@lists.runrev.com> wrote:

> Thanks for the steer, Paul - I’ve not worked with XML in LiveCode so
> hadn’t made the connection between the HTML markup structure & XML.


Keith,

I’ve used revXML for parsing HTML in somewhat controlled conditions. While
revXML can work for HTML, your results will vary based on how well
structured the HTML is. If there are tags that are not closed or are out of
balance then revXML won’t give you the results you expect. If you are
generating the HTML then it shouldn’t be a problem. If it is third party
HTML then you may have to massage the HTML input to get it to work.

It would be great if there were a library wrapper around one of the
dedicated HTML parsers listed on this page:

https://en.m.wikipedia.org/wiki/Comparison_of_HTML_parsers

-- 
Trevor DeVore
ScreenSteps
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to extract an entire element from an HTML file?

2018-11-26 Thread Keith Clarke via use-livecode
Thanks for the steer, Paul - I’ve not worked with XML in LiveCode so hadn’t 
made the connection between the HTML markup structure & XML.

A quick scan through suggests that this library could prove really useful - 
I’ll dig further and look for related resources, thanks.
Best,
Keith   

> On 25 Nov 2018, at 13:55, Paul Dupuis via use-livecode 
>  wrote:
> 
> You could do this with revXML (see teh dictionary), but it is not a
> single call.
> 
> 
> On 11/25/2018 7:12 AM, Keith Clarke via use-livecode wrote:
>> Folks,
>> Can anyone please guide me towards an LiveCode feature(s) that might provide 
>> the equivalent to the javascript jQuery library’s "jQuery(‘.class’).html();" 
>> mechanism that allows one to select an entire element’s content (including 
>> nested elements) from the page DOM?
>> 
>> I have experimented with using jQuery in a browser widget for this purpose 
>> but it introduces dependencies & integration complexities - and I’d prefer 
>> to work without necessitating a desktop UI to contain for the browser 
>> widget, so the code could potentially run on LC Server. 
>> 
>> I can see how I might build a 'roll-your-own' approach, using LiveCode’s 
>> powerful text & chunk features. This would seem to need the HTML file to be 
>> pre-processed, to iterate through the tags of the text file to both find & 
>> mark both each nesting level within elements and also ‘pair-up’ the 
>> (anonymous) closing tags.
>> 
>> Is there a smarter way - any HTML parsing utilities/libraries/lessons/stacks 
>> I should study?
>> Thanks
>> Keith
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to extract an entire element from an HTML file?

2018-11-25 Thread Paul Dupuis via use-livecode
You could do this with revXML (see teh dictionary), but it is not a
single call.


On 11/25/2018 7:12 AM, Keith Clarke via use-livecode wrote:
> Folks,
> Can anyone please guide me towards an LiveCode feature(s) that might provide 
> the equivalent to the javascript jQuery library’s "jQuery(‘.class’).html();" 
> mechanism that allows one to select an entire element’s content (including 
> nested elements) from the page DOM?
>
> I have experimented with using jQuery in a browser widget for this purpose 
> but it introduces dependencies & integration complexities - and I’d prefer to 
> work without necessitating a desktop UI to contain for the browser widget, so 
> the code could potentially run on LC Server. 
>
> I can see how I might build a 'roll-your-own' approach, using LiveCode’s 
> powerful text & chunk features. This would seem to need the HTML file to be 
> pre-processed, to iterate through the tags of the text file to both find & 
> mark both each nesting level within elements and also ‘pair-up’ the 
> (anonymous) closing tags.
>
> Is there a smarter way - any HTML parsing utilities/libraries/lessons/stacks 
> I should study?
> Thanks
> Keith
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

How to extract an entire element from an HTML file?

2018-11-25 Thread Keith Clarke via use-livecode
Folks,
Can anyone please guide me towards an LiveCode feature(s) that might provide 
the equivalent to the javascript jQuery library’s "jQuery(‘.class’).html();" 
mechanism that allows one to select an entire element’s content (including 
nested elements) from the page DOM?

I have experimented with using jQuery in a browser widget for this purpose but 
it introduces dependencies & integration complexities - and I’d prefer to work 
without necessitating a desktop UI to contain for the browser widget, so the 
code could potentially run on LC Server. 

I can see how I might build a 'roll-your-own' approach, using LiveCode’s 
powerful text & chunk features. This would seem to need the HTML file to be 
pre-processed, to iterate through the tags of the text file to both find & mark 
both each nesting level within elements and also ‘pair-up’ the (anonymous) 
closing tags.

Is there a smarter way - any HTML parsing utilities/libraries/lessons/stacks I 
should study?
Thanks
Keith
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode