Paul A Houle wrote:
Schema Last vs. Schema First :-) An RDF virtue that once broadly
understood, across the more traditional DBMS realms, will work wonders
for RDF based Linked Data appreciation.
On Thu, Sep 17, 2009 at 7:23 AM, Kingsley Idehen
<kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:
This is basically an aspect of the whole Linked Data meme that is
lost on too many.
I've got to thank the book by Allemang and Hendler
for setting me straight about data modeling in RDF. RDFS and OWL are
based on a system of duck typing that turns conventional object or
object-relational thinking inside out. It's not necessarily good or
bad, but it's really different. Even though types matter,
predicates come before types because using predicate A can make object
B become a member of type C, even if A is never explicitly put in
Its about a concrete conceptual layer that isn't autistic to context. In
some quarters this is actually called a: Context Model Database .
Looking at the predicates in RDFS or OWL and not understanding the
whole, it's pretty easy to be like "oh, this isn't too different
from a relational database" and miss the point that RDFS&OWL is much
more about inference (creating new triples) than it is about
constraints or the physical layout of the data.
Yes, but the katamari can be confined to a specific data space that is
owned and controlled by a particular person, who has a specific world
view. As long as the axioms are partitioned across data spaces, and the
RDF store is capable of processing within said confines, everyone is
happy. Trouble starts when the claims become global facts imposed on
everyone else that has access to the data space.
One consequence of this is that using an existing predicate can drag
in a lot more baggage than you might want; it's pretty easy to get
the inference engine to infer too much, and false inferences can
snowball like a katamari.
Yep! The trouble is that OWL-appreciation is low, but ultimately, this
is where the magic really lies. This is how URIs (Data Source Names)
will be distinguished based on the data highway smarts they expose etc..
Basically, I am traveling from Boston to Detroit, which route (amongst
many) gets me there quickest, based on my specific preferences etc..
A lot of people are in the habit of reusing vocabularies and seem to
forget that the natural answer to most RDF modeling problems is to
create a new predicate. OWL has a rich set of mechanisms that can
tell systems that
x A y -> x B y
where A is your new predicate and B is a well-known predicate. Once
you merge two "almost-but-not-the-same" things by actually using the
same predicate, it's very hard to fix the damage. If you use
inference, it's easy to change your mind.
Data cleansing is required because there are no abosolute truths and we
all see the same thing differently. What RDF facilitates, above all
else, is its ability to protect our natural tendencies (seeing same
things differently) by inverting the tradition model where inertia is
introduced as a result of different views or perspectives.
It may be different with other data sets, but data cleaning is
absolutely essential working with dbpedia if you want to make
Heterogeneity is the spice of life for a reason. Even our DNA rewards us
when we fuse afar (rather than inbreed) etc. :-)
For instance, all of the time people build bizapps and they need a
list of US states... Usually we go and cut and paste one from
somewhere... But now I've got dbpedia and I should be able to do this
systematically. There's a category in wikipedia for that...
if you ignore the subcategories and just take the actual pages, it's
(almost) what you need, except for some weirdos like
and one state that's got a disambiguator in the name:
Georgia (U.S. state)
It's not hard to clean up this list, but it takes some effort, and
ultimately you're probably going to materialize something new.
Yes, something new, in a new data space that is still plugged into the Web.
These sorts of issues even turn up in highly clean data sets. Once I
built a webapp that had a list of countries in it, this was used to
draw a dropdown list, but the dropdown list was excessively wide,
busting the layout of the site. Now, the list was really long
because there were a few authoritarian countries with long and flowery
names. The transformation from
*Democratic People's Republic of Korea -> North Korea
*improved the usability of the site while eliminating Orwellian
language. This kind of "fit and finish" is needed to make quality
sites, and semweb systems are going to need automated and manual ways
of fixing this so that "Web 3.0" looks like a step forward, not a
Web 3.0 is a step forward, but we need to know where the step is :-) As
you know, It ain't about code, its about data structures combines with
ubiquitous access and reference.
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com