On Sat, Aug 2, 2014 at 10:17 PM, Stéphane Corlosquet <[email protected]> wrote:
> > > > On Thu, Jul 24, 2014 at 9:19 PM, Lewis John Mcgibbney < > [email protected]> wrote: > >> Hi Hadar, >> >> On Thu, Jul 24, 2014 at 3:27 AM, <[email protected]> >> wrote: >> >>> >>> I'm trying to use any23 1.0 to extract opengraph data. >>> i'm simply creating the Any23 class and running extract. >>> It works fine on schema.org but it doesnt extract og tags. >>> Anything special needs to be done? >>> >>> >>> OK I found the issue here. Basically Any23 does recognize the og: markup >> within the <meta> tag's as follows >> >> <meta property="fb:app_id" content="192959324047861" /> >> <meta property="og:title" content="Led Zeppelin" /> >> <meta property="og:url" content="http://www.last.fm/music/Led+Zeppelin" /> >> <meta property="og:image" >> content="http://userserve-ak.last.fm/serve/126/378064.jpg" /> >> >> However there is an issue with the way that last.fm actually publish >> thier data on to the web. >> For example, when I run my Any23 master branch code over the webpage, my >> validation reporting notifies me the following >> >> <validationReport><errors> >> </errors><ruleActivations><ruleActivation><ruleStr> >> >> missing-opengraph-namespace-rule</ruleStr></ruleActivation></ruleActivations><issues><issue><origin> >> [HTML: null]</origin><message> >> Missing OpenGraph namespace >> declaration.</message></issue></issues></validationReport> >> >> bascially that there is no namespace declared to accompany the og: >> markup... >> >> The question for Any23 is whether or not we should acknowledge the >> absence of the namespace declaration and provide one anyone in an effort to >> continue with extraction. >> >> Do you think this would be valueable? If it is then I can write the >> implementation and post a patch for you to try out. >> > > No, I think this would be a bad idea because RDFa already provides such > functionality. The RDFa Core Initial Context > <http://www.w3.org/2011/rdfa-context/rdfa-1.1> includes og and therefore > all parsers shoudl recognize it. That means the prefix declaration for og > can be omitted (that's why semargl and other RDFa parsers have no problem > extracting data from that page). The problem doesn't come from the RDFa > parser, but from the HTML parser. I want to make sure you've seen my > comment in Jira which includes more info: > here is the link: https://issues.apache.org/jira/browse/ANY23-227?focusedCommentId=14083838&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14083838 > > >> Thanks >> Lewis >> > > > > -- > Steph. > -- Steph.
