On Sat, Aug 2, 2014 at 10:17 PM, Stéphane Corlosquet <[email protected]>
wrote:

>
>
>
> On Thu, Jul 24, 2014 at 9:19 PM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
>> Hi Hadar,
>>
>> On Thu, Jul 24, 2014 at 3:27 AM, <[email protected]>
>> wrote:
>>
>>>
>>> I'm trying to use any23 1.0 to extract opengraph data.
>>> i'm simply creating the Any23 class and running extract.
>>> It works fine on schema.org but it doesnt extract og tags.
>>> Anything special needs to be done?
>>>
>>>
>>> OK I found the issue here. Basically Any23 does recognize the og: markup
>> within the <meta> tag's as follows
>>
>> <meta property="fb:app_id" content="192959324047861" />                      
>>                   <meta property="og:title" content="Led Zeppelin" />    
>> <meta property="og:url" content="http://www.last.fm/music/Led+Zeppelin"; />   
>>  <meta property="og:image" 
>> content="http://userserve-ak.last.fm/serve/126/378064.jpg"; />
>>
>> However there is an issue with the way that last.fm actually publish
>> thier data on to the web.
>> For example, when I run my Any23 master branch code over the webpage, my
>> validation reporting notifies me the following
>>
>> <validationReport><errors>
>> </errors><ruleActivations><ruleActivation><ruleStr>
>>
>> missing-opengraph-namespace-rule</ruleStr></ruleActivation></ruleActivations><issues><issue><origin>
>> [HTML: null]</origin><message>
>> Missing OpenGraph namespace
>> declaration.</message></issue></issues></validationReport>
>>
>> bascially that there is no namespace declared to accompany the og:
>> markup...
>>
>> The question for Any23 is whether or not we should acknowledge the
>> absence of the namespace declaration and provide one anyone in an effort to
>> continue with extraction.
>>
>> Do you think this would be valueable? If it is then I can write the
>> implementation and post a patch for you to try out.
>>
>
> No, I think this would be a bad idea because RDFa already provides such
> functionality. The RDFa Core Initial Context
> <http://www.w3.org/2011/rdfa-context/rdfa-1.1> includes og and therefore
> all parsers shoudl recognize it. That means the prefix declaration for og
> can be omitted (that's why semargl and other RDFa parsers have no problem
> extracting data from that page). The problem doesn't come from the RDFa
> parser, but from the HTML parser. I want to make sure you've seen my
> comment in Jira which includes more info:
>
here is the link:
https://issues.apache.org/jira/browse/ANY23-227?focusedCommentId=14083838&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14083838





>
>
>> Thanks
>> Lewis
>>
>
>
>
> --
> Steph.
>



-- 
Steph.

Reply via email to