Re: 15 Ways to Think About Data Quality (Just for a Start)

Kingsley Idehen Tue, 12 Apr 2011 12:03:36 -0700

On 4/12/11 1:52 PM, glenn mcdonald wrote:

    You continue to imply that seeing subjectively imperfect data
    projected via a data oriented tool is problematic re., your "total
    data experience" world view.
I continue to think it's hilarious that you consider it "subjectivelyimperfect" that your dataset says Michael Jackson and Michael Rodrickare the same person. What would constitute "objectively imperfect" to you?
So yes, I think you should feel a little embarrassed aboutbroadcasting links to a demo in which the very first piece of data onesees is obviously wrong. You've got billions of entities in dbpedia,and the technology doesn't care which one you pick, so surely youcould pick one where the errors aren't as prominent. The fact that youdidn't, and don't seem to care, sends a message about your attitudetowards data.

Simple exercise.

Assumptions:

1. an individual (neither you or I) stumbles across one of my demonstrations
2. your characterization of the demo producer (me) is 100% accurate

3. data beholder or consumer seeks to look at the data differentlymodulo my "data quality ambivalence" .

Nothing about the DBMS hosting the datasets (where each has a NamedGraph IRI) prevents the beholder or consumer from achieving thefollowing via the available data access endpoints:

1. Accessing and altering the source query or SPARQL protocol URL -- Iseldom publish a demo where all routes to actual query and data sourcesisn't machine and/or human discernible (note the use of footers, <link/>in <head/> and HTTP response headers in this regard)

2. Adding or removing pragmas re. inference context (owl:sameAsexpansion, invocation of fuzzy InverseFunctionalProperty rules, orcombination of both) as part of the view alteration quest outlined above

3. Viewing original or actual query results via alternative tools thatcan process HTTP response payloads -- remember nothing about SPARQLmandates RDF as sole query results format across SELECT, DESCRIBE, orCONSTRUCT queries

4. Sharing new query, new result set, new data presentation etc.. via aURL as part of an evolving conversation about the data in question.

What I outline above lies at the very core of every demo I produce andthe very core of Virtuoso's architecture. The problem in our eyes (atOpenLink) boils down to dealing with data quality subjectivity atInterWeb scales, amongst other matters outside the realm of thisparticular conversation.

Hopefully, you know, we've already done this entire loop in ODBC, JDBC,OLE-DB, ADO.NET land eons ago. The limitations inherent in theaforementioned realms heavily influenced Virtuoso's architecture and asa result every single demo URL that I share. Note, there is a littlemore to every URL I publish. I am most interested in helping peopleappreciate URIs as immensely flexible Data Conductors since (in my worldview) Data == New Electricity.

Remember, I do espouse to the mantra: Data is like Wine whileApplication code is like Fish. A Good (Cool) URL or URI should be ableto stand the test of time :-)





--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: 15 Ways to Think About Data Quality (Just for a Start)

Reply via email to