subject:"Parsing \(scraping\) OpenGraph Tags from html HEAD"

Re: Parsing (scraping) OpenGraph Tags from html HEAD

2017-08-02 Thread Thierry Douez via use-livecode

2017-08-02 17:54 GMT+02:00 Sannyasin Brahmanathaswami via use-livecode <
use-livecode@lists.runrev.com>:

> Responding on top
>
> Jacque's method only gets us a  list, not an array, so one ends up having
> to write more code to parse the list anyway, your method is more efficient.
>
> "not comfortable with RegEx"  Ha,, right. but it worth the effort to keep
> the little grey cells green! I will have to study the regEx… things like ?ms
> are "brand new" to me.
>

So, you win your first Regex training :)

(?ms) are regex options.

m means multi-lines
s means the dot ( '.' ) could also match a return/cr/lf char.



>
>
> re: extracting the head first: I was under the impression your repeat loop
> would have to work through the entire text of _HTML unnecessarily and that
> extracting the heads would reduce processing time.



Well, you are right:
 but only when the regex will try to match after the last valid pattern.

What is most costly is the delete inside the loop; so working only with the
... of your html might be more efficient in this case. But
this is more a LC thing.




> OTOH, Andre tells me that for this kind of operation, even cell phones
> have CPU's that are more powerful than some desktop machines and so perhaps
> the time to loop through the entire html source is too trivial to consider
> at all.
>

Yep, as I said, only after the last match, the regex will loop through the
end
of the html and only one time. About quality concerns, restricting the
regex to the  part is a good idea as you never know what could be
some html in the future...



>
> Thanks for the effort you put into this.


You're welcome.

Kind regards,

Thierry



We are adding OG tags to all the media on our web site (eventually) and our
> apps will need to parse that out in various contexts.
>
> BR
>
>
>
>
>
> On 8/1/17, 10:07 PM, "use-livecode on behalf of Thierry Douez via
> use-livecode"  use-livecode@lists.runrev.com> wrote:
>
> 2017-08-02 6:45 GMT+02:00 Sannyasin Brahmanathaswami:
>
>
> Hi Brahmanathaswami,
> 
>
> Thanks Thierry
> >
> > though I'm yet sure when using regEx this is better than using
> Jacque's
> > method
> >
>
> That's 2 different ways..
> but with the regex one, you have the exact key and value of each tags,
> nothing more to do.
>
>
> Either way it would seem prudent to extract the head first before
> processing
> >
>
> Mmm, don't really see why, but I've added a line of code for this too
> below.
>
> 
>
> >
> > Using jacques method just gets the list..
>
> and we need to do more coding to get the array we need.
> >
> > But your method can only handle 1 tag.
> >
>
>
> I was aware of that but didn't know what you want to achieve,
> therefore I
> leave it for the reader.
> However this has nothing to do with the regex but with the code inside
> the
> repeat loop.
>
>
> Here is another way to do it, changing only *1* line of code inside
> the loop
> with the same regex as before:
>
>
>
>   -- to please BR wishes, but not necessary
>   -- erase everything after 
>put replaceText( _Html, "(?ms).*?$", empty) into _Html
>
>repeat while matchChunk( _Html, Rx, p1,p2,p3,p4 )
>   put  char p1 to p2 of _Html & tab& char p3 to p4 of _Html  &cr
> after
> Rslt
>   delete char 1 to p4 of _Html
>end repeat
>delete last char of Rslt -- extra cr
>
>put Rslt into fld 1
>answer "Got " & the number of lines of Rslt & " og: meta tags!"
>
>
> Building a multi-dimensionnal array after the extraction,
> a bit more work inside the repeat loop will be needed,
> but  the extraction part is still valid.
> 
>
> 
>
> Finally, if you are not at ease with regex, go with Jacque's way and
> everything will be fine.
> There are fundamentally not much differences in between the 2 ways.
>
>
> Kind regards,
>
> Thierry
>
>
>
>
>
>
> > On 7/31/17, 12:31 AM, "use-livecode on behalf of Thierry Douez wrote:
> >
> > So, here is the code:
> >
> >local Rx, Rslt, _Html, OG
> >
> >put empty into Rslt
> >put URL "https://www.youtube.com/user/kauaiaadheenam"; into
> _Html
> >
> >get
> > "(?ms) > 22}(.+?)\x{22}>"
> >put IT into Rx
> >
> >repeat while matchChunk( _Html, Rx,p1,p2,p3,p4 )
> >   put  char p3 to p4 of _Html  into OG[  char p1 to p2 of
> _Html ]
> >   delete char 1 to p4 of _Html
> >end repeat
> >
> >
> >
> > and you can test it this way:
> >
> >combine OG using return and ":"
> >put OG into fld 1
> >
> >
> >
> > HTH and feel free to ask any question...
> >
> > Kind regards,
> >
> > Thierry
> >
>
>
> --
> 
>

Re: Parsing (scraping) OpenGraph Tags from html HEAD

2017-08-02 Thread Sannyasin Brahmanathaswami via use-livecode

Responding on top

Jacque's method only gets us a  list, not an array, so one ends up having to 
write more code to parse the list anyway, your method is more efficient.

"not comfortable with RegEx"  Ha,, right. but it worth the effort to keep the 
little grey cells green! I will have to study the regEx… things like ?ms
are "brand new" to me.

re: extracting the head first: I was under the impression your repeat loop 
would have to work through the entire text of _HTML unnecessarily and that 
extracting the heads would reduce processing time. OTOH, Andre tells me that 
for this kind of operation, even cell phones have CPU's that are more powerful 
than some desktop machines and so perhaps the time to loop through the entire 
html source is too trivial to consider at all.

Thanks for the effort you put into this. We are adding OG tags to all the media 
on our web site (eventually) and our apps will need to parse that out in 
various contexts.

BR

On 8/1/17, 10:07 PM, "use-livecode on behalf of Thierry Douez via use-livecode" 
 wrote:

2017-08-02 6:45 GMT+02:00 Sannyasin Brahmanathaswami:

Hi Brahmanathaswami,

Thanks Thierry
>
> though I'm yet sure when using regEx this is better than using Jacque's
> method
>

That's 2 different ways..
but with the regex one, you have the exact key and value of each tags,
nothing more to do.

Either way it would seem prudent to extract the head first before processing
>

Mmm, don't really see why, but I've added a line of code for this too
below.

>
> Using jacques method just gets the list..

and we need to do more coding to get the array we need.
>
> But your method can only handle 1 tag.
>

I was aware of that but didn't know what you want to achieve, therefore I
leave it for the reader.
However this has nothing to do with the regex but with the code inside the
repeat loop.

Here is another way to do it, changing only *1* line of code inside the loop
with the same regex as before:

  -- to please BR wishes, but not necessary
  -- erase everything after 
   put replaceText( _Html, "(?ms).*?$", empty) into _Html

   repeat while matchChunk( _Html, Rx, p1,p2,p3,p4 )
  put  char p1 to p2 of _Html & tab& char p3 to p4 of _Html  &cr after
Rslt
  delete char 1 to p4 of _Html
   end repeat
   delete last char of Rslt -- extra cr

   put Rslt into fld 1
   answer "Got " & the number of lines of Rslt & " og: meta tags!"

Building a multi-dimensionnal array after the extraction,
a bit more work inside the repeat loop will be needed,
but  the extraction part is still valid.

Finally, if you are not at ease with regex, go with Jacque's way and
everything will be fine.
There are fundamentally not much differences in between the 2 ways.

Kind regards,

Thierry

> On 7/31/17, 12:31 AM, "use-livecode on behalf of Thierry Douez wrote:
>
> So, here is the code:
>
>local Rx, Rslt, _Html, OG
>
>put empty into Rslt
>put URL "https://www.youtube.com/user/kauaiaadheenam"; into _Html
>
>get
> "(?ms) 22}(.+?)\x{22}>"
>put IT into Rx
>
>repeat while matchChunk( _Html, Rx,p1,p2,p3,p4 )
>   put  char p3 to p4 of _Html  into OG[  char p1 to p2 of _Html ]
>   delete char 1 to p4 of _Html
>end repeat
>
>
>
> and you can test it this way:
>
>combine OG using return and ":"
>put OG into fld 1
>
>
>
> HTH and feel free to ask any question...
>
> Kind regards,
>
> Thierry
>

-- 

Thierry Douez - sunny-tdz.com
sunnYrex - sunnYtext2speech - sunnYperl - sunnYmidi - sunnYmage
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your 
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Parsing (scraping) OpenGraph Tags from html HEAD

2017-08-02 Thread Thierry Douez via use-livecode

2017-08-02 6:45 GMT+02:00 Sannyasin Brahmanathaswami:

Hi Brahmanathaswami,

Thanks Thierry
>
> though I'm yet sure when using regEx this is better than using Jacque's
> method
>

That's 2 different ways..
but with the regex one, you have the exact key and value of each tags,
nothing more to do.

Either way it would seem prudent to extract the head first before processing
>

Mmm, don't really see why, but I've added a line of code for this too
below.

>
> Using jacques method just gets the list..

and we need to do more coding to get the array we need.
>
> But your method can only handle 1 tag.
>

I was aware of that but didn't know what you want to achieve, therefore I
leave it for the reader.
However this has nothing to do with the regex but with the code inside the
repeat loop.

Here is another way to do it, changing only *1* line of code inside the loop
with the same regex as before:

  -- to please BR wishes, but not necessary
  -- erase everything after 
   put replaceText( _Html, "(?ms).*?$", empty) into _Html

   repeat while matchChunk( _Html, Rx, p1,p2,p3,p4 )
  put  char p1 to p2 of _Html & tab& char p3 to p4 of _Html  &cr after
Rslt
  delete char 1 to p4 of _Html
   end repeat
   delete last char of Rslt -- extra cr

   put Rslt into fld 1
   answer "Got " & the number of lines of Rslt & " og: meta tags!"

Building a multi-dimensionnal array after the extraction,
a bit more work inside the repeat loop will be needed,
but  the extraction part is still valid.

Finally, if you are not at ease with regex, go with Jacque's way and
everything will be fine.
There are fundamentally not much differences in between the 2 ways.

Kind regards,

Thierry

> On 7/31/17, 12:31 AM, "use-livecode on behalf of Thierry Douez wrote:
>
> So, here is the code:
>
>local Rx, Rslt, _Html, OG
>
>put empty into Rslt
>put URL "https://www.youtube.com/user/kauaiaadheenam"; into _Html
>
>get
> "(?ms) 22}(.+?)\x{22}>"
>put IT into Rx
>
>repeat while matchChunk( _Html, Rx,p1,p2,p3,p4 )
>   put  char p3 to p4 of _Html  into OG[  char p1 to p2 of _Html ]
>   delete char 1 to p4 of _Html
>end repeat
>
>
>
> and you can test it this way:
>
>combine OG using return and ":"
>put OG into fld 1
>
>
>
> HTH and feel free to ask any question...
>
> Kind regards,
>
> Thierry
>

-- 

Thierry Douez - sunny-tdz.com
sunnYrex - sunnYtext2speech - sunnYperl - sunnYmidi - sunnYmage
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Parsing (scraping) OpenGraph Tags from html HEAD

2017-08-01 Thread Sannyasin Brahmanathaswami via use-livecode

Thanks Thierry

though I'm yet sure when using regEx this is better than using Jacque's method


on parseHeader pData
   set the lineDel to "",l)-1 of l & cr
after tList
   end repeat
   -- do something with tList
end parseHeader

Either way it would seem prudent to extract the head first before processing

put the htmlText of widget "youtubes" into _HTML # interesting convention of 
underscore usage for var declaration
put  char ( offset("",_HTML)) to  ( ( offset("",_HTML))+6) of 
_html  into tHead

Using jacques method just gets the list.. and we need to do more coding to get 
the array we need.
but it returns:

"og:site_name" content="YouTube"
"og:url" content="https://www.youtube.com/user/kauaiaadheenam";
"og:title" content="Kauai's Hindu Monastery"
"og:image" 
content="https://yt3.ggpht.com/-p766LczvKHY/AAI/AAA/SIu6ZAJbMDc/s900-c-k-no-mo-rj-c0xff/photo.jpg";
"og:description" content="{where hinduism meets the future}"
"og:type" content="profile"
"og:video:tag" content="kauai"
"og:video:tag" content="hawaii"
"og:video:tag" content="hindu"
"og:video:tag" content="hinduism"
"og:video:tag" content="siva"
# And many more tags total of 39 tags…

But your method can only handle 1 tag.

description:{where hinduism meets the future}
image:https://yt3.ggpht.com/-p766LczvKHY/AAI/AAA/SIu6ZAJbMDc/s900-c-k-no-mo-rj-c0xff/photo.jpg
site_name:YouTube
title:Kauai's Hindu Monastery
type:profile
url:https://www.youtube.com/user/kauaiaadheenam
video:tag:scriptural   

#r est of the tags, all preceeding 38 of them, are lost  -- "scriptural" was 
the last one
# and so stands as the final output for the key as the loop which is
# effectively retain the single key "og:video"tag" and replacing the value 39 
times
# leaving us with on the last value of the 39th tag.
# so we would need an ordered multi-dimensional array like

OG["site_name"]
# and the other top keys, then:
OG["video"]["tags"][1]  
OG["video"]["tags"][2]  

But I'm not sure we need tags for the particular use case in question which is 
to create a robust "history" of web viewing with more detail.OTOH, since we 
are coding for "Oh God" data, we may as well get all the tags into the array. 
This could be useful later to have this code in the toolbox for when we *do* 
want all the tags from the OG set… God does not like to see partial metadata, 
because S/He Knows All the Metadata.

BR






On 7/31/17, 12:31 AM, "use-livecode on behalf of Thierry Douez via 
use-livecode"  wrote:

So, here is the code:

   local Rx, Rslt, _Html, OG

   put empty into Rslt
   put URL "https://www.youtube.com/user/kauaiaadheenam"; into _Html

   get
"(?ms)"
   put IT into Rx

   repeat while matchChunk( _Html, Rx,p1,p2,p3,p4 )
  put  char p3 to p4 of _Html  into OG[  char p1 to p2 of _Html ]
  delete char 1 to p4 of _Html
   end repeat



and you can test it this way:

   combine OG using return and ":"
   put OG into fld 1





HTH and feel free to ask any question...

Kind regards,

Thierry

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Parsing (scraping) OpenGraph Tags from html HEAD

2017-07-31 Thread Mark Wieder via use-livecode


On 07/29/2017 01:16 PM, Sannyasin Brahmanathaswami via use-livecode wrote:





LOL. I guess Brahmanathaswami's been around these parts long enough by 
now to have OG status.


--
 Mark Wieder
 ahsoftw...@gmail.com

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Parsing (scraping) OpenGraph Tags from html HEAD

2017-07-31 Thread Thierry Douez via use-livecode

2017-07-29 22:16 GMT+02:00 Sannyasin Brahmanathaswami

:


> you want to extract from the  of the document  the openGraph  tags
>
> 
> https://www.youtube.
> com/user/kauaiaadheenam">
> 
> https://yt3.ggpht.
> com/-p766LczvKHY/AAI/AAA/SIu6ZAJbMDc/s900-
> c-k-no-mo-rj-c0xff/photo.jpg">
> 
>
> c) you also cannot depend on the output being line delimited, because some
> CMS's delivery "agents" will minimize this to
>
>  content="https://www.youtube.com/user/kauaiaadheenam";> property="og:title" content="Kauai's Hindu Monastery"> property="og:image" content="https://yt3.ggpht.
> com/-p766LczvKHY/AAI/AAA/SIu6ZAJbMDc/s900-
> c-k-no-mo-rj-c0xff/photo.jpg"> content="{where hinduism meets the future}">
>
> Has anyone rolled up a parser/scraper for this?

Looks like "idiot simple text extraction"



Hi,

Here is a quick coded piece of code, tested only on your URL.
I did write this regex based on the Datas you provide in your email.


>

I see the other thread on scraping pages generated by JS and suspect
> perhaps some wizard among us already has this done…would save a bit of time
> here.
>
> BR
>

Every time you see any kind of scraping/search/extraction/transformation
in JS, you can be sure
it's possible to do it in LiveCode

So, here is the code:

   local Rx, Rslt, _Html, OG

   put empty into Rslt
   put URL "https://www.youtube.com/user/kauaiaadheenam"; into _Html

   get
"(?ms)"
   put IT into Rx

   repeat while matchChunk( _Html, Rx,p1,p2,p3,p4 )
  put  char p3 to p4 of _Html  into OG[  char p1 to p2 of _Html ]
  delete char 1 to p4 of _Html
   end repeat



and you can test it this way:

   combine OG using return and ":"
   put OG into fld 1





HTH and feel free to ask any question...

Kind regards,

Thierry

-- 

Thierry Douez - sunny-tdz.com
sunnYrex - sunnYtext2speech - sunnYperl - sunnYmidi - sunnYmage
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Parsing (scraping) OpenGraph Tags from html HEAD

2017-07-30 Thread Sannyasin Brahmanathaswami via use-livecode

" delimiters can now be more than a single character."

Hmm, that completely did not cross my mind… awesome..  

 

On 7/29/17, 5:36 PM, "use-livecode on behalf of J. Landman Gay via 
use-livecode"  wrote:

Here's where it's handy that delimiters can now be more than a single 
character. This should extract the lines you need regardless of whether 
they contain carriage returns or not:


on parseHeader pData
   set the lineDel to "",l)-1 of l & cr 
after tList
   end repeat
   -- do something with tList
end parseHeader

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Parsing (scraping) OpenGraph Tags from html HEAD

2017-07-29 Thread J. Landman Gay via use-livecode

Here's where it's handy that delimiters can now be more than a single 
character. This should extract the lines you need regardless of whether 
they contain carriage returns or not:



on parseHeader pData
  set the lineDel to "",l)-1 of l & cr 
after tList

  end repeat
  -- do something with tList
end parseHeader


On 7/29/17 3:16 PM, Sannyasin Brahmanathaswami via use-livecode wrote:

given that

a) trying to instantiate an XML tree from any given web page is likely to fail 
85% of the time because they simply are never built to that strict a standard


and


b) you want to extract from the  of the document  the openGraph  tags


https://www.youtube.com/user/kauaiaadheenam";>

https://yt3.ggpht.com/-p766LczvKHY/AAI/AAA/SIu6ZAJbMDc/s900-c-k-no-mo-rj-c0xff/photo.jpg";>


c) you also cannot depend on the output being line delimited, because some CMS's delivery 
"agents" will minimize this to

https://www.youtube.com/user/kauaiaadheenam";>https://yt3.ggpht.com/-p766LczvKHY/AAI/AAA/SIu6ZAJbMDc/s900-c-k-no-mo-rj-c0xff/photo.jpg";>

Has anyone rolled up a parser/scraper for this?   Looks like "idiot simple text extraction"  but I'm 
trying to wrap my head around how to extract the name=value pairs, and not getting anything easy…  these are space 
delimited, but then we also have spaces inside quoted strings.  Maybe easier target "" 
using regEx with matchText, get ALL the meta tags in the HEAD, push to array then just check for if key contains 
"og:"  then we have an openGraph value.

I'll sleep on this, but but before I wake up and write 50 lines to get this 
done…  I see the other thread on scraping pages generated by JS and suspect 
perhaps some wizard among us already has this done…would save a bit of time 
here.

BR




___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode




--
Jacqueline Landman Gay | jac...@hyperactivesw.com
HyperActive Software   | http://www.hyperactivesw.com


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Parsing (scraping) OpenGraph Tags from html HEAD

2017-07-29 Thread Jonathan Lynch via use-livecode

Hi Swami, I know you can do this in Javascript, but you will have to enumerate 
through a JavaScript object to get all of the properties:

https://www.w3schools.com/jsref/prop_meta_content.asp

Sent from my iPhone

> On Jul 29, 2017, at 4:16 PM, Sannyasin Brahmanathaswami via use-livecode 
>  wrote:
> 
> given that
> 
> a) trying to instantiate an XML tree from any given web page is likely to 
> fail 85% of the time because they simply are never built to that strict a 
> standard
> 
> 
> and
> 
> 
> b) you want to extract from the  of the document  the openGraph  tags
> 
> 
> https://www.youtube.com/user/kauaiaadheenam";>
> 
>  content="https://yt3.ggpht.com/-p766LczvKHY/AAI/AAA/SIu6ZAJbMDc/s900-c-k-no-mo-rj-c0xff/photo.jpg";>
> 
> 
> c) you also cannot depend on the output being line delimited, because some 
> CMS's delivery "agents" will minimize this to
> 
>  content="https://www.youtube.com/user/kauaiaadheenam";> property="og:title" content="Kauai's Hindu Monastery"> property="og:image" 
> content="https://yt3.ggpht.com/-p766LczvKHY/AAI/AAA/SIu6ZAJbMDc/s900-c-k-no-mo-rj-c0xff/photo.jpg";>  property="og:description" content="{where hinduism meets the future}">
> 
> Has anyone rolled up a parser/scraper for this?   Looks like "idiot simple 
> text extraction"  but I'm trying to wrap my head around how to extract the 
> name=value pairs, and not getting anything easy…  these are space delimited, 
> but then we also have spaces inside quoted strings.  Maybe easier target 
> "" using regEx with matchText, get ALL the meta tags in the HEAD, 
> push to array then just check for if key contains "og:"  then we have an 
> openGraph value.
> 
> I'll sleep on this, but but before I wake up and write 50 lines to get this 
> done…  I see the other thread on scraping pages generated by JS and suspect 
> perhaps some wizard among us already has this done…would save a bit of time 
> here.
> 
> BR
> 
> 
> 
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Parsing (scraping) OpenGraph Tags from html HEAD

2017-07-29 Thread Sannyasin Brahmanathaswami via use-livecode

given that

a) trying to instantiate an XML tree from any given web page is likely to fail 
85% of the time because they simply are never built to that strict a standard


and


b) you want to extract from the  of the document  the openGraph  tags


https://www.youtube.com/user/kauaiaadheenam";>

https://yt3.ggpht.com/-p766LczvKHY/AAI/AAA/SIu6ZAJbMDc/s900-c-k-no-mo-rj-c0xff/photo.jpg";>


c) you also cannot depend on the output being line delimited, because some 
CMS's delivery "agents" will minimize this to

https://www.youtube.com/user/kauaiaadheenam";>https://yt3.ggpht.com/-p766LczvKHY/AAI/AAA/SIu6ZAJbMDc/s900-c-k-no-mo-rj-c0xff/photo.jpg";>

Has anyone rolled up a parser/scraper for this?   Looks like "idiot simple text 
extraction"  but I'm trying to wrap my head around how to extract the 
name=value pairs, and not getting anything easy…  these are space delimited, 
but then we also have spaces inside quoted strings.  Maybe easier target "" using regEx with matchText, get ALL the meta tags in the HEAD, push to 
array then just check for if key contains "og:"  then we have an openGraph 
value.

I'll sleep on this, but but before I wake up and write 50 lines to get this 
done…  I see the other thread on scraping pages generated by JS and suspect 
perhaps some wizard among us already has this done…would save a bit of time 
here.

BR




___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Parsing (scraping) OpenGraph Tags from html HEAD

Re: Parsing (scraping) OpenGraph Tags from html HEAD

Re: Parsing (scraping) OpenGraph Tags from html HEAD

Re: Parsing (scraping) OpenGraph Tags from html HEAD

Re: Parsing (scraping) OpenGraph Tags from html HEAD

Re: Parsing (scraping) OpenGraph Tags from html HEAD

Re: Parsing (scraping) OpenGraph Tags from html HEAD

Re: Parsing (scraping) OpenGraph Tags from html HEAD

Re: Parsing (scraping) OpenGraph Tags from html HEAD

Parsing (scraping) OpenGraph Tags from html HEAD

10 matches

Site Navigation

Mail list logo

Footer information