Hi John,
John Giannandrea wrote:
Chris Bizer wrote:
See:
http://blog.freebase.com/2008/03/28/full-data-dumps-are-now-available/
I think that it would really be exiting to turn these dumps into
RDF,
publish them on the Web as Linked Data and interlink them with data
sets
from the LOD cloud. For instance, interlinking them with DBpedia
should be
very easy as both datasets contain Wikipedia article identifiers.
We would be happy to help support this effort to make our data more
LOD friendly.
This would be great.
Getting the data out on the Semantic Web as Linked Data also don't
have to be a big effort as you are already having everything that is
needed in place.
One reason we did not yet emit simple RDF ourselves was potential
confusion about mapping specific freebase properties to the larger
range of possible ontologies. It would be simple to declare a new
set of URIs for our schema, much harder to pick and choose from the
large array of available ontologies for the range of our data.
I think for the first iteration it is completely OK if you define a
new set of URIs for your schema. As a second iteration you could
replace terms from your schema with terms from well-known vocabularies
like FOAF or SKOS.
From the LOD perspective a lot would already be won if:
1. there would be a URI for each topic in Freebase and dereferencing
this URI over the Web would return a RDF description of the concept
using a Freebase specific schema.
2. this URI would be interlinked with other data sourcesin the LOD
cloud, so that people could use Ssemantic Web browsers to navigate
from these data sources into the Freebase data and so that Semantic
Web crawlers can find and index the data.
So, a minimal effort approach to getting Freebase onto the Semantic
Web could look like this:
1. Define URIs for all your concepts, somethink like
http://www.freebase.com/rdf/resource/9202a8c04000641f800000000016a1a7
2. Deploy a Linked Data wrapper around your API that returns an RDF
description of (in the example above) the film when somebody
dereferences the URI above. A very easy way to implement such a
wrapper would be to just tweek the PHP script that we are using for
the RDF Book mashup. The script is found at
http://www4.wiwiss.fu-berlin.de/bizer/bookmashup/index.html
3. Interlink this RDF Version of Freebase with other data sources. The
simplest option here would be to interlink Freebase with DBpedia as
both dataset contain Wikipedia article IDs. So what you would do is to
add a RDF link stating that a specific concept in Freebase is the same
as a concept in DBpedia to the RDF you return when one of your URIs
gets dereferenced. For instance:
http://www.freebase.com/rdf/resource/9202a8c04000641f800000000016a1a7
owl:sameAs http://dbpedia.org/resource/2046_(film)
4. You would send us an RDF file containing these RDF links for all
Freebase concepts and we would load it into DBpedia and also serve
these links.
I think all this could be done within 3 days work and would allow
Linked Data browsers, like the ones listed here
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/SemWebClients,
to access and navigate between both datasets and would allow crawlers,
like the ones listed here
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/SemanticWebSearchEngines,
to index both datasets.
What do you think?
Technical background information about the whole process is found in
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
After his, one could start thinking about also providing RDF dumps, so
that people could load Freebase and DBpedia together into a RDF store
and do whatever they want with the data. Or think about using well
known terms from other vocabularies and ontologies.
We have been experimenting with using freebase itself to help
catalog compatible ontologies for specific freebase properties.
For example
http://www.freebase.com/view/user/jamie/web_ontology/property_mapping
If folks want to help with this, then it should be possible to use
our open API to generate RDF of whatever 'flavor' you happen to be
working with, by specifying a preferred set of ontologies at query
time.
Using terms from well-known vocabularies as well as serving the data
using different vocabularies is both important, but in my opinion
something for the second step. First step: Publish linked data. See
what people do with it.
Cheers
Chris
-jg