Le 2 avril 2012 12:44, Rüdiger Kurz <[email protected]> a écrit : > Hi Staboler, > > during the last hackathon that took place in Saarbrücken next to the IKS > review meeting I had the opportunity to play around with Stanbol content > hub. At this point I want to suggest a new feature for the content hub: > > I have an already annotated content and I want to find related content by > using stanbol. Therefore I want to suggest the extension of stanbol content > hub for RDFa extraction support. > > Benefit: > The semantic information that is already present will not be lost. RDFa > generated by the CMS or that is being created by annotate.js can be > transfered to Stanbol and than be used to retrieve content. > > Procedure: > 1. Send a RDF(a) annotated HTML-Document to Stanbol. > 2. Stanbols content hub extracts (e.g. using clarezza as Reto mentioned) the > RDFa annotations and stores the document together with its entities.
This is not the role of the Content Hub to do document enhancements. The focus of the Content Hub is to store and query content and its annotations. Document pre-processing should be handled by the Enhancer (that can be called by the Content Hub when uploading a new document), and actually it might already be the case: the metaxa engine should be example to extract the RDFa content of a HTML document. I don't know how it works in practice though. Try it and if it does not work, have a look at the source code. I think the Clerezza developers will also provide RDFa parsers and maybe serializers too at some point. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel
