Re: opengraph not being extracted

Stéphane Corlosquet Sat, 02 Aug 2014 19:18:07 -0700

On Thu, Jul 24, 2014 at 9:19 PM, Lewis John Mcgibbney <
[email protected]> wrote:


> Hi Hadar,
>
> On Thu, Jul 24, 2014 at 3:27 AM, <[email protected]>
> wrote:
>
>>
>> I'm trying to use any23 1.0 to extract opengraph data.
>> i'm simply creating the Any23 class and running extract.
>> It works fine on schema.org but it doesnt extract og tags.
>> Anything special needs to be done?
>>
>>
>> OK I found the issue here. Basically Any23 does recognize the og: markup
> within the <meta> tag's as follows
>
> <meta property="fb:app_id" content="192959324047861" />                       
>                  <meta property="og:title" content="Led Zeppelin" />    <meta 
> property="og:url" content="http://www.last.fm/music/Led+Zeppelin"; />    <meta 
> property="og:image" 
> content="http://userserve-ak.last.fm/serve/126/378064.jpg"; />
>
> However there is an issue with the way that last.fm actually publish
> thier data on to the web.
> For example, when I run my Any23 master branch code over the webpage, my
> validation reporting notifies me the following
>
> <validationReport><errors>
> </errors><ruleActivations><ruleActivation><ruleStr>
>
> missing-opengraph-namespace-rule</ruleStr></ruleActivation></ruleActivations><issues><issue><origin>
> [HTML: null]</origin><message>
> Missing OpenGraph namespace
> declaration.</message></issue></issues></validationReport>
>
> bascially that there is no namespace declared to accompany the og:
> markup...
>
> The question for Any23 is whether or not we should acknowledge the absence
> of the namespace declaration and provide one anyone in an effort to
> continue with extraction.
>
> Do you think this would be valueable? If it is then I can write the
> implementation and post a patch for you to try out.
>

No, I think this would be a bad idea because RDFa already provides such
functionality. The RDFa Core Initial Context
<http://www.w3.org/2011/rdfa-context/rdfa-1.1> includes og and therefore
all parsers shoudl recognize it. That means the prefix declaration for og
can be omitted (that's why semargl and other RDFa parsers have no problem
extracting data from that page). The problem doesn't come from the RDFa
parser, but from the HTML parser. I want to make sure you've seen my
comment in Jira which includes more info:
https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12730068&commentId=14083838



> Thanks
> Lewis
>



-- 
Steph.

Re: opengraph not being extracted

Reply via email to