Hi,
I thought I could share some remarks on the topic. First thing; well
done for the release of the LDIF, it's an interesting piece of work and
it's dearly needed. I started to release a bit of my work too, although
it's in a very early stage (https://github.com/correndo/mediation).
On 6/30/11 10:49 AM, Ruben Verborgh wrote:
Hi Chris,
Thanks for the fast and detailed reply, it's a very interesting discussion.
Indeed, there are several ways for mapping and identity resolution.
But what strikes me is that people in the community seem to be insufficiently
aware of the possibilities and performance of current reasoners.
About the identity resolution.
Silk it's a nice framework for discovery identities' equivalents
although I think that for the fruition of such equivalences a more
distributed approach should be preferred. An approach where the links
among entities are discovered (no matter with what tool) and *shared*
could be more organic to an architecture of distributed data publishing.
About the reasoners.
I guess on this issue one could distinguish on where a given reasoner is
applied. Within Linked Data, where the amount of data is assumed to be
huge, the application of a reasoner is usually felt as not applicable.
They just don't scale as well as one would like, although some triple
stores are having good performances (owlim, 4sr and others).
As you can see the data translation requires lots of structural
transformations as well as complex property value transformations using
various functions. All things where current logical formalisms are not very
good at.
Oh yes, they are. All needed transformations in your paper can be performed by
at least two reasoners: cwm [1] and EYE [2] by using built-ins [3]. Include are
regular expressions, datatype transforms…
Frankly, every transform in the R2R example can be expressed as an N3 rule.
Logic formalisms can be applied to data structural transformation,
although it sounds a bit of an overkill. I think the real issue here is
to find the right tool for the right job. If we have heavyweight
ontologies that differ conceptually one another then a reasoner is the
right tool. But what if we're dealing more with different data schema
that don't require a complex reasoning?
There are, I think, two different levels that can be aligned by two
different formalisms: RDF, and OWL.
Aligning RDF graphs it's something that has little to do with
description logics, the semantics it's inscribed in the structure and
structure alignments are therefore called for. A preliminary work I
published is [1] was based on graph rewriting, but it handles query
rewriting and it was think to be a lightweight approach (schema alignment).
On the use of patterns literals.. It's a bit of using RDF for describing
a string whose content's semantics is defined elsewhere. It just doesn't
sound right, but again, even using RDF and reification for describing at
least the basic graph patterns [1], doesn't solve the problem of
semantic elicitation. The interpretation of an alignment is still
relative to a particular tool.
So, instead of writing literals like this:
mp:Gene
r2r:sourcePattern "?SUBJ a genes:gene";
r2r:targetPattern "?SUBJ a smwcat:Gene".
I would have written a chunk of RDF pattern graph like this:
mediation:lhs [ a rdf:Statement ; rdf:subject _:SUBJ ; rdf:predicate rdf:type ;
rdf:objectgenes:gene .]
mediation:rhs [ a rdf:Statement ; rdf:subject _:SUBJ ; rdf:predicate rdf:type ;
rdf:objectsmwcat:Gene.]
For aligning OWL ontologies there have been a number of proposals, EDOAL
[2], C-OWL [3] to name a few not considering the already mentioned
properties described in OWL (owl:sameAs, owl:equivalentProperty,
owl:equivalentClass). The question for any formalism for OWL alignment
formalisms is more to find different profiles of complexity that fit
different application cases.
[1] http://eprints.ecs.soton.ac.uk/18370/
[2] http://alignapi.gforge.inria.fr/edoal.html
[3]
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.9326&rep=rep1&type=pdf
If I as a application developer
want to get a job done, what does it help me if I can exchange mappings
between different tools that all don't get the job done?
Because different tools can contribute different results, and if you use a
common language and idiom, they all can work with the same data and metadata.
more and more developers know SPARQL which makes it easier for them to learn
R2R.
The developers that know SPARQL is a proper subset of those that know plain
RDF, which is what I suggest using. And even if rules are necessary, N3 is only
a small extension of RDF.
Benchmark we have the feeling that SPARQL engines are more suitable for
this task then current reasoning engines due to their performance problems
as well as problems to deal with inconsistent data.
The extremely solid performance [4] of EYE is too little known. It can achieve
things in linear time that other reasoners can never solve.
But my main point is semantics. Why make a new system with its own meanings and
interpretations, when there is so much to do with plain RDF and its widely
known vocabularies (RDFS, OWL)?
Ironically, a tool which contributes to the reconciliation of different RDF
sources, does not use common vocabularies to express well-known relationships.
Cheers,
Ruben
[1] http://www.w3.org/2000/10/swap/doc/cwm.html
[2] http://eulersharp.sourceforge.net/
[3] http://www.w3.org/2000/10/swap/doc/CwmBuiltins
[4] http://eulersharp.sourceforge.net/2003/03swap/dtb-2010.txt
On 30 Jun 2011, at 10:51, Chris Bizer wrote:
Hi Ruben,
thank you for your detailed feedback.
Of course it is always a question of taste how you prefer to express data
translation rules and I agree that simple mappings can also be expressed
using standard OWL constructs.
When designing the R2R mapping language, we first analyzed the real-world
requirements that arise if you try to properly integrate data from existing
Linked Data on the Web. We summarize our findings in Section 5 of the
following paper
http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/
BizerSchultz-COLD-R2R-Paper.pdf
As you can see the data translation requires lots of structural
transformations as well as complex property value transformations using
various functions. All things where current logical formalisms are not very
good at.
Others reasons why we choose to base the mapping language on SPARQL where
that:
1. more and more developers know SPARQL which makes it easier for them to
learn R2R.
2. we to be able to translate large amounts (billions of triples in the
mid-term) of messy inconsistent Web data and from our experience with the
BSBM Benchmark we have the feeling that SPARQL engines are more suitable for
this task then current reasoning engines due to their performance problems
as well as problems to deal with inconsistent data.
I disagree with you that R2R mappings are not suitable for being exchanged
on the Web. In contrast they were especially designed for being published
and discovered on the Web and allow partial mappings from different sources
to be easily combined (see paper above for details about this).
I think your argument about the portability of mappings between different
tools currently is only partially valid. If I as a application developer
want to get a job done, what does it help me if I can exchange mappings
between different tools that all don't get the job done?
Also note, that we aim with LDIF to provide for identity resolution in
addition to schema mapping. It is well known that identity resolution in
practical setting requires rather complex matching heuristics (see Silk
papers for details about different matchers that are usually employed) and
identity resolution is again a topic where reasoning engines don't have too
much to offer.
But again, there are different ways and tastes about how to express mapping
rules and identity resolution heuristics. R2R and Silk LSL are our
approaches to getting the job done and we are of course happy if other
people provide working solutions for the task of integrating and cleansing
messy data from the Web of Linked Data and are happy to compare our approach
with theirs.
Cheers,
Chris
--
******************************************
Gianluca Correndo
Research fellow IAM group
Electronic and Computer Science
University of Southampton
******************************************