Re: ANN: LDIF - Linked Data Integration Framework V0.1 released.

Gianluca Correndo Fri, 01 Jul 2011 01:35:12 -0700

Hi,

I thought I could share some remarks on the topic. First thing; welldone for the release of the LDIF, it's an interesting piece of work andit's dearly needed. I started to release a bit of my work too, althoughit's in a very early stage (https://github.com/correndo/mediation).


On 6/30/11 10:49 AM, Ruben Verborgh wrote:

Hi Chris,

Thanks for the fast and detailed reply, it's a very interesting discussion.

Indeed, there are several ways for mapping and identity resolution.
But what strikes me is that people in the community seem to be insufficiently 
aware of the possibilities and performance of current reasoners.

About the identity resolution.

Silk it's a nice framework for discovery identities' equivalentsalthough I think that for the fruition of such equivalences a moredistributed approach should be preferred. An approach where the linksamong entities are discovered (no matter with what tool) and *shared*could be more organic to an architecture of distributed data publishing.


About the reasoners.

I guess on this issue one could distinguish on where a given reasoner isapplied. Within Linked Data, where the amount of data is assumed to behuge, the application of a reasoner is usually felt as not applicable.They just don't scale as well as one would like, although some triplestores are having good performances (owlim, 4sr and others).

As you can see the data translation requires lots of structural
transformations as well as complex property value transformations using
various functions. All things where current logical formalisms are not very
good at.


Oh yes, they are. All needed transformations in your paper can be performed by 
at least two reasoners: cwm [1] and EYE [2] by using built-ins [3]. Include are 
regular expressions, datatype transforms…
Frankly, every transform in the R2R example can be expressed as an N3 rule.

Logic formalisms can be applied to data structural transformation,although it sounds a bit of an overkill. I think the real issue here isto find the right tool for the right job. If we have heavyweightontologies that differ conceptually one another then a reasoner is theright tool. But what if we're dealing more with different data schemathat don't require a complex reasoning?

There are, I think, two different levels that can be aligned by twodifferent formalisms: RDF, and OWL.Aligning RDF graphs it's something that has little to do withdescription logics, the semantics it's inscribed in the structure andstructure alignments are therefore called for. A preliminary work Ipublished is [1] was based on graph rewriting, but it handles queryrewriting and it was think to be a lightweight approach (schema alignment).

On the use of patterns literals.. It's a bit of using RDF for describinga string whose content's semantics is defined elsewhere. It just doesn'tsound right, but again, even using RDF and reification for describing atleast the basic graph patterns [1], doesn't solve the problem ofsemantic elicitation. The interpretation of an alignment is stillrelative to a particular tool.

So, instead of writing literals like this:

mp:Gene
    r2r:sourcePattern "?SUBJ a genes:gene";
    r2r:targetPattern "?SUBJ a smwcat:Gene".

I would have written a chunk of RDF pattern graph like this:

mediation:lhs [ a rdf:Statement ; rdf:subject _:SUBJ ; rdf:predicate rdf:type ; 
rdf:objectgenes:gene  .]
mediation:rhs [ a rdf:Statement ; rdf:subject _:SUBJ ; rdf:predicate rdf:type ; 
rdf:objectsmwcat:Gene.]

For aligning OWL ontologies there have been a number of proposals, EDOAL[2], C-OWL [3] to name a few not considering the already mentionedproperties described in OWL (owl:sameAs, owl:equivalentProperty,owl:equivalentClass). The question for any formalism for OWL alignmentformalisms is more to find different profiles of complexity that fitdifferent application cases.


[1] http://eprints.ecs.soton.ac.uk/18370/
[2] http://alignapi.gforge.inria.fr/edoal.html

[3]http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.9326&rep=rep1&type=pdf

If I as a application developer
want to get a job done, what does it help me if I can exchange mappings
between different tools that all don't get the job done?


Because different tools can contribute different results, and if you use a 
common language and idiom, they all can work with the same data and metadata.

more and more developers know SPARQL which makes it easier for them to learn 
R2R.


The developers that know SPARQL is a proper subset of those that know plain 
RDF, which is what I suggest using. And even if rules are necessary, N3 is only 
a small extension of RDF.

Benchmark we have the feeling that SPARQL engines are more suitable for
this task then current reasoning engines due to their performance problems
as well as problems to deal with inconsistent data.


The extremely solid performance [4] of EYE is too little known. It can achieve 
things in linear time that other reasoners can never solve.

But my main point is semantics. Why make a new system with its own meanings and 
interpretations, when there is so much to do with plain RDF and its widely 
known vocabularies (RDFS, OWL)?
Ironically, a tool which contributes to the reconciliation of different RDF 
sources, does not use common vocabularies to express well-known relationships.

Cheers,

Ruben

[1] http://www.w3.org/2000/10/swap/doc/cwm.html
[2] http://eulersharp.sourceforge.net/
[3] http://www.w3.org/2000/10/swap/doc/CwmBuiltins
[4] http://eulersharp.sourceforge.net/2003/03swap/dtb-2010.txt

On 30 Jun 2011, at 10:51, Chris Bizer wrote:

Hi Ruben,

thank you for your detailed feedback.

Of course it is always a question of taste how you prefer to express data
translation rules and I agree that simple mappings can also be expressed
using standard OWL constructs.

When designing the R2R mapping language, we first analyzed the real-world
requirements that arise if you try to properly integrate data from existing
Linked Data on the Web. We summarize our findings in Section 5 of the
following paper
http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/
BizerSchultz-COLD-R2R-Paper.pdf
As you can see the data translation requires lots of structural
transformations as well as complex property value transformations using
various functions. All things where current logical formalisms are not very
good at.

Others reasons why we choose to base the mapping language on SPARQL where
that:

1. more and more developers know SPARQL which makes it easier for them to
learn R2R.
2. we to be able to translate large amounts (billions of triples in the
mid-term) of messy inconsistent Web data and from our experience with the
BSBM Benchmark we have the feeling that SPARQL engines are more suitable for
this task then current reasoning engines due to their performance problems
as well as problems to deal with inconsistent data.

I disagree with you that R2R mappings are not suitable for being exchanged
on the Web. In contrast they were especially designed for being published
and discovered on the Web and allow partial mappings from different sources
to be easily combined (see paper above for details about this).

I think your argument about the portability of mappings between different
tools currently is only partially valid. If I as a application developer
want to get a job done, what does it help me if I can exchange mappings
between different tools that all don't get the job done?

Also note, that we aim with LDIF to provide for identity resolution in
addition to schema mapping. It is well known that identity resolution in
practical setting requires rather complex matching heuristics (see Silk
papers for details about different matchers that are usually employed) and
identity resolution is again a topic where reasoning engines don't have too
much to offer.

But again, there are different ways and tastes about how to express mapping
rules and identity resolution heuristics. R2R and Silk LSL are our
approaches to getting the job done and we are of course happy if other
people provide working solutions for the task of integrating and cleansing
messy data from the Web of Linked Data and are happy to compare our approach
with theirs.

Cheers,

Chris



--
******************************************
 Gianluca Correndo
 Research fellow IAM group
 Electronic and Computer Science
 University of Southampton
******************************************

Re: ANN: LDIF - Linked Data Integration Framework V0.1 released.

Reply via email to