Re: Parsing (scraping) OpenGraph Tags from html HEAD

Jonathan Lynch via use-livecode Sat, 29 Jul 2017 13:33:48 -0700

Hi Swami, I know you can do this in Javascript, but you will have to enumerate 
through a JavaScript object to get all of the properties:


https://www.w3schools.com/jsref/prop_meta_content.asp

Sent from my iPhone

> On Jul 29, 2017, at 4:16 PM, Sannyasin Brahmanathaswami via use-livecode 
> <use-livecode@lists.runrev.com> wrote:
> 
> given that
> 
> a) trying to instantiate an XML tree from any given web page is likely to 
> fail 85% of the time because they simply are never built to that strict a 
> standard
> 
> 
> and
> 
> 
> b) you want to extract from the <head> of the document  the openGraph  tags
> 
> <meta property="og:site_name" content="YouTube">
> <meta property="og:url" content="https://www.youtube.com/user/kauaiaadheenam";>
> <meta property="og:title" content="Kauai's Hindu Monastery">
> <meta property="og:image" 
> content="https://yt3.ggpht.com/-p766LczvKHY/AAAAAAAAAAI/AAAAAAAAAAA/SIu6ZAJbMDc/s900-c-k-no-mo-rj-c0xffffff/photo.jpg";>
> <meta property="og:description" content="{where hinduism meets the future}">
> 
> c) you also cannot depend on the output being line delimited, because some 
> CMS's delivery "agents" will minimize this to
> 
> <meta property="og:site_name" content="YouTube"><meta property="og:url" 
> content="https://www.youtube.com/user/kauaiaadheenam";><meta 
> property="og:title" content="Kauai's Hindu Monastery"><meta 
> property="og:image" 
> content="https://yt3.ggpht.com/-p766LczvKHY/AAAAAAAAAAI/AAAAAAAAAAA/SIu6ZAJbMDc/s900-c-k-no-mo-rj-c0xffffff/photo.jpg";><meta
>  property="og:description" content="{where hinduism meets the future}">
> 
> Has anyone rolled up a parser/scraper for this?   Looks like "idiot simple 
> text extraction"  but I'm trying to wrap my head around how to extract the 
> name=value pairs, and not getting anything easy…  these are space delimited, 
> but then we also have spaces inside quoted strings.  Maybe easier target 
> "<meta (.*?)>" using regEx with matchText, get ALL the meta tags in the HEAD, 
> push to array then just check for if key contains "og:"  then we have an 
> openGraph value.
> 
> I'll sleep on this, but but before I wake up and write 50 lines to get this 
> done…  I see the other thread on scraping pages generated by JS and suspect 
> perhaps some wizard among us already has this done…would save a bit of time 
> here.
> 
> BR
> 
> 
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Parsing (scraping) OpenGraph Tags from html HEAD

Reply via email to