Hi,
I thought I could share some remarks on the topic. First thing; well done for the release of the LDIF, it's an interesting piece of work and it's dearly needed. I started to release a bit of my work too, although it's in a very early stage (https://github.com/correndo/mediation).

On 6/30/11 10:49 AM, Ruben Verborgh wrote:
Hi Chris,

Thanks for the fast and detailed reply, it's a very interesting discussion.

Indeed, there are several ways for mapping and identity resolution.
But what strikes me is that people in the community seem to be insufficiently 
aware of the possibilities and performance of current reasoners.
About the identity resolution.
Silk it's a nice framework for discovery identities' equivalents although I think that for the fruition of such equivalences a more distributed approach should be preferred. An approach where the links among entities are discovered (no matter with what tool) and *shared* could be more organic to an architecture of distributed data publishing.

About the reasoners.
I guess on this issue one could distinguish on where a given reasoner is applied. Within Linked Data, where the amount of data is assumed to be huge, the application of a reasoner is usually felt as not applicable. They just don't scale as well as one would like, although some triple stores are having good performances (owlim, 4sr and others).

As you can see the data translation requires lots of structural
transformations as well as complex property value transformations using
various functions. All things where current logical formalisms are not very
good at.

Oh yes, they are. All needed transformations in your paper can be performed by 
at least two reasoners: cwm [1] and EYE [2] by using built-ins [3]. Include are 
regular expressions, datatype transforms…
Frankly, every transform in the R2R example can be expressed as an N3 rule.
Logic formalisms can be applied to data structural transformation, although it sounds a bit of an overkill. I think the real issue here is to find the right tool for the right job. If we have heavyweight ontologies that differ conceptually one another then a reasoner is the right tool. But what if we're dealing more with different data schema that don't require a complex reasoning?

There are, I think, two different levels that can be aligned by two different formalisms: RDF, and OWL. Aligning RDF graphs it's something that has little to do with description logics, the semantics it's inscribed in the structure and structure alignments are therefore called for. A preliminary work I published is [1] was based on graph rewriting, but it handles query rewriting and it was think to be a lightweight approach (schema alignment).

On the use of patterns literals.. It's a bit of using RDF for describing a string whose content's semantics is defined elsewhere. It just doesn't sound right, but again, even using RDF and reification for describing at least the basic graph patterns [1], doesn't solve the problem of semantic elicitation. The interpretation of an alignment is still relative to a particular tool.
So, instead of writing literals like this:

mp:Gene
    r2r:sourcePattern "?SUBJ a genes:gene";
    r2r:targetPattern "?SUBJ a smwcat:Gene".

I would have written a chunk of RDF pattern graph like this:

mediation:lhs [ a rdf:Statement ; rdf:subject _:SUBJ ; rdf:predicate rdf:type ; 
rdf:objectgenes:gene  .]
mediation:rhs [ a rdf:Statement ; rdf:subject _:SUBJ ; rdf:predicate rdf:type ; 
rdf:objectsmwcat:Gene.]


For aligning OWL ontologies there have been a number of proposals, EDOAL [2], C-OWL [3] to name a few not considering the already mentioned properties described in OWL (owl:sameAs, owl:equivalentProperty, owl:equivalentClass). The question for any formalism for OWL alignment formalisms is more to find different profiles of complexity that fit different application cases.

[1] http://eprints.ecs.soton.ac.uk/18370/
[2] http://alignapi.gforge.inria.fr/edoal.html
[3] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.9326&rep=rep1&type=pdf



If I as a application developer
want to get a job done, what does it help me if I can exchange mappings
between different tools that all don't get the job done?

Because different tools can contribute different results, and if you use a 
common language and idiom, they all can work with the same data and metadata.

more and more developers know SPARQL which makes it easier for them to learn 
R2R.

The developers that know SPARQL is a proper subset of those that know plain 
RDF, which is what I suggest using. And even if rules are necessary, N3 is only 
a small extension of RDF.

Benchmark we have the feeling that SPARQL engines are more suitable for
this task then current reasoning engines due to their performance problems
as well as problems to deal with inconsistent data.

The extremely solid performance [4] of EYE is too little known. It can achieve 
things in linear time that other reasoners can never solve.

But my main point is semantics. Why make a new system with its own meanings and 
interpretations, when there is so much to do with plain RDF and its widely 
known vocabularies (RDFS, OWL)?
Ironically, a tool which contributes to the reconciliation of different RDF 
sources, does not use common vocabularies to express well-known relationships.

Cheers,

Ruben

[1] http://www.w3.org/2000/10/swap/doc/cwm.html
[2] http://eulersharp.sourceforge.net/
[3] http://www.w3.org/2000/10/swap/doc/CwmBuiltins
[4] http://eulersharp.sourceforge.net/2003/03swap/dtb-2010.txt

On 30 Jun 2011, at 10:51, Chris Bizer wrote:

Hi Ruben,

thank you for your detailed feedback.

Of course it is always a question of taste how you prefer to express data
translation rules and I agree that simple mappings can also be expressed
using standard OWL constructs.

When designing the R2R mapping language, we first analyzed the real-world
requirements that arise if you try to properly integrate data from existing
Linked Data on the Web. We summarize our findings in Section 5 of the
following paper
http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/
BizerSchultz-COLD-R2R-Paper.pdf
As you can see the data translation requires lots of structural
transformations as well as complex property value transformations using
various functions. All things where current logical formalisms are not very
good at.

Others reasons why we choose to base the mapping language on SPARQL where
that:

1. more and more developers know SPARQL which makes it easier for them to
learn R2R.
2. we to be able to translate large amounts (billions of triples in the
mid-term) of messy inconsistent Web data and from our experience with the
BSBM Benchmark we have the feeling that SPARQL engines are more suitable for
this task then current reasoning engines due to their performance problems
as well as problems to deal with inconsistent data.

I disagree with you that R2R mappings are not suitable for being exchanged
on the Web. In contrast they were especially designed for being published
and discovered on the Web and allow partial mappings from different sources
to be easily combined (see paper above for details about this).

I think your argument about the portability of mappings between different
tools currently is only partially valid. If I as a application developer
want to get a job done, what does it help me if I can exchange mappings
between different tools that all don't get the job done?

Also note, that we aim with LDIF to provide for identity resolution in
addition to schema mapping. It is well known that identity resolution in
practical setting requires rather complex matching heuristics (see Silk
papers for details about different matchers that are usually employed) and
identity resolution is again a topic where reasoning engines don't have too
much to offer.

But again, there are different ways and tastes about how to express mapping
rules and identity resolution heuristics. R2R and Silk LSL are our
approaches to getting the job done and we are of course happy if other
people provide working solutions for the task of integrating and cleansing
messy data from the Web of Linked Data and are happy to compare our approach
with theirs.

Cheers,

Chris




--
******************************************
 Gianluca Correndo
 Research fellow IAM group
 Electronic and Computer Science
 University of Southampton
******************************************



Reply via email to