date:20110412

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Norman Gray


Glenn and all, greetings.

On 2011 Apr 9, at 03:10, glenn mcdonald wrote:

 I don't think data quality is an amorphous, aesthetic, hopelessly subjective
 topic. Data beauty might be subjective, and the same data may have
 different applicability to different tasks, but there are a lot of obvious
 and straightforward ways of thinking about the quality of a dataset
 independent of the particular preferences of individual beholders. Here are
 just some of them:

This is an excellent list.  I think only a minority of these qualities could be 
scored precisely, but I think all of them could be scored on some 
awful-to-excellent scale, so that while they may not be quite objective 
metrics, they're at least clearly debatable.

Complete objectivity is probably impossible here -- inevitable in a world where 
the concept of 'Rome' means significantly different things to the local 
authority, the ancient historian, and the tourist board.  But 'solves my 
problem well' is a pretty good substitute.

Best wishes,

Norman


-- 
Norman Gray  :  http://nxg.me.uk

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Muriel Foulonneau

Hi Glenn,

This reminds me some established frameworks.
Here is a list of criteria gathered from the literature for metadata quality
[1]. It is not exhaustive. Besiki Svitlia has also worked on  a more
comprehensive framework [2]. More has been done on information quality in
general. However I guess they do not cover all aspects you mentioned, in
particular, in relation to the ontology used and the linkage aspects for
instance.

*Completeness* In a complete metadata record, the learning object is
described using all the fields that are relevant to describe it. *
Accuracy*In an accurate metadata record, the data contained in the
fields correspond
to the object that is being described. * Provenance* The provenance
parameter reflects the degree of trust that you have in the creator of the
metadata record. *Conformance to expectations* This parameter measure how
well the data contained in the record let you gain knowledge about the
learning object without actually seeing the object *Logical consistency and
coherence* This parameter reflects two measures: The consistency measures if
the values chosen for different fields in the record agree between them.
Coherence measures if all the fields talk about the same object
*Timeliness*This parameter measure how up-to-date the metadata record
is compared with
changes in the object *Accessibility* This parameter measures how well you
are able to understand the content of the metadata record Muriel Foulonneau

[1] Thomas R. Bruce and Diane I. Hillman 'The Continuum of Metadata Quality'

[2]
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.8053rep=rep1type=pdf

On Sat, Apr 9, 2011 at 3:10 AM, glenn mcdonald gmcdon...@furia.com wrote:

 I don't think data quality is an amorphous, aesthetic, hopelessly
 subjective topic. Data beauty might be subjective, and the same data may
 have different applicability to different tasks, but there are a lot of
 obvious and straightforward ways of thinking about the quality of a dataset
 independent of the particular preferences of individual beholders. Here are
 just some of them:

 1. Accuracy: Are the individual nodes that refer to factual information
 factually and lexically correct. Like, is Chicago spelled Chigaco or does
 the dataset say its population is 2.7?

 2. Intelligibility: Are there human-readable labels on things, so you can
 tell what a thing is when you're looking at? Is there a model, so you can
 tell what questions you can ask? If a thing has multiple labels (or a set of
 owl:sameAs things havemlutiple labels), do you know which (or if) one is
 canonical?

 3. Referential correspondence: If a set of data points represents some set
 of real-world referents, is there one and only one point per referent? If
 you have 9,780 data points representing cities, but 5 of them are Chicago,
 Chicago, IL, Metro Chicago, Metropolitain Chicago, Illinois and
 Chicagoland, that's bad.

 4. Completeness: Where you have data representing a clear finite set of
 referents, do you have them all? All the countries, all the states, all the
 NHL teams, etc? And if you have things related to these sets, are those
 projections complete? Populations of every country? Addresses of arenas of
 all the hockey teams?

 5. Boundedness: Where you have data representing a clear finite set of
 referents, is it unpolluted by other things? E.g., can you get a list of
 current real countries, not mixed with former states or fictional empires or
 adminstrative subdivisions?

 6. Typing: Do you really have properly typed nodes for things, or do you
 just have literals? The first president of the US was not George
 Washington^^xsd:string, it was a person whose name-renderings include
 George Washington. Your ability to ask questions will be constrained or
 crippled if your data doesn't know the difference.

 7. Modeling correctness: Is the logical structure of the data properly
 represented? Graphs are relational databases without the crutch of rows;
 if you screw up the modeling, your queries will produce garbage.

 8. Modeling granularity: Did you capture enough of the data to actually
 make use of it. :us :president :george_washington isn't exactly wrong, but
 it's pretty limiting. Model presidencies, with their dates, and you've got
 much more powerful data.

 9. Connectedness: If you're bringing together datasets that used to be
 separate, are the join points represented properly. Is the US from your
 country list the same as (or owl:sameAs) the US from your list of
 presidencies and the US from your list of world cities and their
 populations?

 10. Isomorphism: If you're bring together datasets that used to be
 separate, are their models reconciled? Does an album contain songs, or does
 it contain tracks which are publications of recordings of songs, or
 something else? If each data point answers this question differently, even
 simple-seeming queries may be intractable.

 11. Currency: Is the data up-to-date?

 12. Directionality: Can you

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Dave Reynolds

On Fri, 2011-04-08 at 21:10 -0400, glenn mcdonald wrote:
 I don't think data quality is an amorphous, aesthetic, hopelessly
 subjective topic. Data beauty might be subjective, and the same data
 may have different applicability to different tasks, but there are a
 lot of obvious and straightforward ways of thinking about the quality
 of a dataset independent of the particular preferences of individual
 beholders. Here are just some of them:
 
 
 1. Accuracy: Are the individual nodes that refer to factual
 information factually and lexically correct. Like, is Chicago spelled
 Chigaco or does the dataset say its population is 2.7?
 
 
 2. Intelligibility: Are there human-readable labels on things, so you
 can tell what a thing is when you're looking at? Is there a model, so
 you can tell what questions you can ask? If a thing has multiple
 labels (or a set of owl:sameAs things havemlutiple labels), do you
 know which (or if) one is canonical?
 
 
 3. Referential correspondence: If a set of data points represents some
 set of real-world referents, is there one and only one point per
 referent? If you have 9,780 data points representing cities, but 5 of
 them are Chicago, Chicago, IL, Metro Chicago, Metropolitain
 Chicago, Illinois and Chicagoland, that's bad.
 
 
 4. Completeness: Where you have data representing a clear finite set
 of referents, do you have them all? All the countries, all the states,
 all the NHL teams, etc? And if you have things related to these sets,
 are those projections complete? Populations of every country?
 Addresses of arenas of all the hockey teams?
 
 
 5. Boundedness: Where you have data representing a clear finite set of
 referents, is it unpolluted by other things? E.g., can you get a list
 of current real countries, not mixed with former states or fictional
 empires or adminstrative subdivisions?
 
 
 6. Typing: Do you really have properly typed nodes for things, or do
 you just have literals? The first president of the US was not George
 Washington^^xsd:string, it was a person whose name-renderings include
 George Washington. Your ability to ask questions will be constrained
 or crippled if your data doesn't know the difference.
 
 
 7. Modeling correctness: Is the logical structure of the data properly
 represented? Graphs are relational databases without the crutch of
 rows; if you screw up the modeling, your queries will produce
 garbage.
 
 
 8. Modeling granularity: Did you capture enough of the data to
 actually make use of it. :us :president :george_washington isn't
 exactly wrong, but it's pretty limiting. Model presidencies, with
 their dates, and you've got much more powerful data.
 
 
 9. Connectedness: If you're bringing together datasets that used to be
 separate, are the join points represented properly. Is the US from
 your country list the same as (or owl:sameAs) the US from your list of
 presidencies and the US from your list of world cities and their
 populations?
 
 
 10. Isomorphism: If you're bring together datasets that used to be
 separate, are their models reconciled? Does an album contain songs, or
 does it contain tracks which are publications of recordings of songs,
 or something else? If each data point answers this question
 differently, even simple-seeming queries may be intractable.
 
 
 11. Currency: Is the data up-to-date?
 
 
 12. Directionality: Can you navigate the logical binary relationships
 in either direction? Can you get from a country to its presidencies to
 their presidents, or do you have to know to only ask about presidents'
 presidencies' countries? Or worse, do you have to ask every question
 in permutations of directions because some data asserts things one way
 and some asserts it only the other?
 
 
 13. Attribution: If your data comes from multiple sources, or in
 multiple batches, can you tell which came from where?
 
 
 14. History: If your data has been edited, can you tell how and by
 whom?
 
 
 15. Internal consistency: Do the populations of your counties add up
 to the populations of your states? Do the substitutes going into your
 soccer matches balance the substitutes going out?

That's a fantastic list and should be recorded on a wiki somewhere!

A minor quibble, not sure about Directionality. You can follow an RDF
link in both directions (at least in SPARQL and any RDF API I've worked
with).  I would be inclined to generalize and rephrase this as ...

Consistency of modelling: whichever way you make modelling decisions
such as direction of relations (from country to president, from
president to country) it is done consistently so you don't have to ask
many permutations of the same query. 
Possible additions:

Licensed: the license under which the data can be used is clearly
defined, ideally in a machine checkable way.

Sustainable: there is some credible basis for believing the data will
be maintained as current (e.g. backed by some appropriate organization
or by a sufficiently large group of individuals, has

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread glenn mcdonald


 As part of conversations about data, you do need to able to see the
 subjectively bad to make it subjectively good. What you can't do (which
 is what Glenn does repeatedly) is conflate the tools that actually enable
 you see the subjectively good, bad, or ugly with said data.


I'm a tool developer with first hand experience, as you put it, too. I'm
not conflating the tools and the data. But the complete data experience is
the product of the tools and the data.

 Is Excel rendered useless because a list of countries with obvious errors
 was presented in the spreadsheet? To an audience of Spreadsheet developers
 (programmers making a Spreadsheet product) that's irrelevant


That attitude is how Excel ended up with essentially no real data-cleaning
tools, which is pathetic. The job of data tools is to mediate between people
and computers, and thus helping people identify and understand and fix and
improve data is just as much the tools' (and tool developers')
responsibility as showing you a list of entity URIs. The list of
data-quality metrics is also effectively a data-tool task list.

this is why my demos are oriented towards enabling the beholder disambiguate
 his/her/its quest via filtering applied to entity types and other
 properties.


Which is what I was talking about in Boundedness: does the data have the
properties you need to extract the subset you want. E.g., Danny Ayers
yesterday was trying to make a SPARQL query for Wordnet that found the
planets in the solar system that aren't named after Roman gods. But neither
he nor I could find any way in the data to distinguish actual planets in the
list of solar bodies, so we couldn't quite make it right. That was a data
problem, not a tool problem. But the difficulty of figuring this out, *using
* the tools, was a tool problem.

But of the 17 other qualities on my list + Dave's additions, at least 15 of
them directly bear on the feasibility of using filtering to extract a good
subset out of a flawed corpus.

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread glenn mcdonald


 A minor quibble, not sure about Directionality. You can follow an RDF
 link in both directions (at least in SPARQL and any RDF API I've worked
 with).  I would be inclined to generalize and rephrase this as ...

 Consistency of modelling: whichever way you make modelling decisions
 such as direction of relations (from country to president, from
 president to country) it is done consistently so you don't have to ask
 many permutations of the same query.


Yes, inconsistency is the worst kind of directionality problem, but to me
Directionality is still a problem in itself. An RDF browser that shows you
both the incoming and outgoing triples is *addressing* that problem, as is
anything that infers the inverses. But the problem still exists. It's an
artificial skew between the logical properties of the data and the manifest
properties in the system. An alternate data-modeling regime in which both
directions are always explicitly asserted would not have this problem (but
would take on, obviously, a higher burden on internal consistency as a
result).

I like your additions of Licensing, Sustainability and Authority.

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen


On 4/12/11 9:33 AM, glenn mcdonald wrote:


As part of conversations about data, you do need to able to see
the subjectively bad to make it subjectively good. What you
can't do (which is what Glenn does repeatedly) is conflate the
tools that actually enable you see the subjectively good, bad, or
ugly with said data.


I'm a tool developer with first hand experience, as you put it, too. 
I'm not conflating the tools and the data. But the complete data 
experience is the product of the tools and the data.


But who ever told you, or inferred to you, that any LOD demo is about 
the Complete Linked Data Experience let alone the Complete Data 
Experience. Who even knows, emphatically, what the so called Complete 
Data Experience actually is? That's as subjective a statement as I've 
every heard. Its the very line that continues to separate us.


I might have my own perception of the aforementioned experience, but I 
have no business enforcing that on anyone else, its just my world view, 
end of story.


Thus, I hold my position re. your subjective conflation of matters.

When people publish demos of their products, they aren't publishing the 
demos for your world view they are publishing it from theirs, first. 
Of course, bearing in mind our similarities and disparities as cognitive 
beings there is varied potential for intersection of world views i.e., 
fusion. Naturally, fusion can occur with varying degrees of friction.




 Is Excel rendered useless because a list of countries with
obvious errors was presented in the spreadsheet? To an audience of
Spreadsheet developers (programmers making a Spreadsheet product)
that's irrelevant


That attitude is how Excel ended up with essentially no real 
data-cleaning tools, which is pathetic.


And your comments once again reflect the issues I have with your 
commentary.


Excel the pathetic dominates the world of spreadsheets. Nuff said. Did 
write an alternative? Why isn't the world using your alternative if such 
a thing exists. Bearing in mind the huge market share of Excel why are 
you overlooking the massive opportunity to cleanup via your superior 
product?



The job of data tools is to mediate between people and computers, and 
thus helping people identify and understand and fix and improve data 
is just as much the tools' (and tool developers') responsibility as 
showing you a list of entity URIs.


What is a Data Tool? Again, 100% subjective. Some people might think of 
Excel as a Data Tool others see it as something completely different.


The list of data-quality metrics is also effectively a data-tool task 
list.


this is why my demos are oriented towards enabling the beholder
disambiguate his/her/its quest via filtering applied to entity
types and other properties.


Which is what I was talking about in Boundedness: does the data have 
the properties you need to extract the subset you want. E.g., Danny 
Ayers yesterday was trying to make a SPARQL query for Wordnet that 
found the planets in the solar system that aren't named after Roman 
gods. But neither he nor I could find any way in the data to 
distinguish actual planets in the list of solar bodies, so we couldn't 
quite make it right.


And did you post a callout here or on Twitter or anyone else for other 
folks to chime in?


That was a data problem, not a tool problem. But the difficulty of 
figuring this out, /using/ the tools, was a tool problem.


But the tools (or your activity) unveiled a critical problem aligned to 
your specific goals. That's subjectively bad data laying foundation for 
subjectively improved data. All you need to do is open up a conversation 
that eventually results in a linkset that fixes the problem and delivers 
the context lenses you seek. This is a common and expected issue re. 
Linked Data at any scale, beyond your personal computer or personally 
curated data space.





But of the 17 other qualities on my list + Dave's additions, at least 
15 of them directly bear on the feasibility of using filtering to 
extract a good subset out of a flawed corpus.


In my world: knowledge starts by discovering what you don't know. Same 
rule applies to data quality, you have to find the broken data before 
you can fix it. Do take issue with the mechanism that helps you find the 
broken data. Of course, take issue if there isn't a feedback loop or the 
loop is clogged with intransigence etc.. Neither is the case in the 
Linked Data realms of interest to me.




--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen


On 4/12/11 9:53 AM, glenn mcdonald wrote:
On Tue, Apr 12, 2011 at 8:58 AM, Kingsley Idehen 
kide...@openlinksw.com mailto:kide...@openlinksw.com wrote:


1.

http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson
-- basic description of 'Micheal Jackson' from DBpedia


The very first assertion on this, your first link, is 
is sameAs of: Michael Rodrick. And you wonder why I keep distracting 
your technology demos by talking about data quality...




Again, do you not understand the fundamental point? There is an 
inaccurate assertion in a relation in a give data space. How do you fix 
it if you can't see it in the first place? Subjectively bad data can 
lead to subjectively improved data.


You take a single assertion from a 21 Billion+ data space, and decide 
that's the essence of the matter. Finding this assertion (needle in the 
21 Billion+ haystack) is part of the point. Negating the errant named 
graph all together is another, post discovery. Not reasoning on owl:same 
assertion is yet another.




--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread glenn mcdonald


 But who ever told you, or inferred to you, that any LOD demo is about the
 Complete Linked Data Experience let alone the Complete Data Experience.


I didn't capitalize those. A human's experience of data is the product of
the underlying data and the tool/experience/interface through which they see
it.

Excel the pathetic dominates the world of spreadsheets. Nuff said.


And yet, you don't seem to have dissolved your company, therefore you don't
actually think Excel is the end of all conversations.

Did write an alternative? Why isn't the world using your alternative if such
 a thing exists. Bearing in mind the huge market share of Excel why are you
 overlooking the massive opportunity to cleanup via your superior product?


I wasn't making any claims about my project in this thread. But Needle and
Google Refine are two examples of attempts to do data-management tools with
more of a focus on cleanup and curation.


 What is a Data Tool? Again, 100% subjective.


I don't think I know what you mean by the word subjective.

E.g., Danny Ayers yesterday was trying to make a SPARQL query for Wordnet
 that found the planets in the solar system that aren't named after Roman
 gods. But neither he nor I could find any way in the data to distinguish
 actual planets in the list of solar bodies, so we couldn't quite make it
 right.


 And did you post a callout here or on Twitter or anyone else for other
 folks to chime in?


Yes, Danny asked the question on Twitter and on his blog. I saw it and
answered it. Nobody else chimed in.

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen

On 4/12/11 9:53 AM, glenn mcdonald wrote:
On Tue, Apr 12, 2011 at 8:58 AM, Kingsley Idehen
kide...@openlinksw.com mailto:kide...@openlinksw.com wrote:

http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson
-- basic description of 'Micheal Jackson' from DBpedia

The very first assertion on this, your first link, is
is sameAs of: Michael Rodrick. And you wonder why I keep distracting
your technology demos by talking about data quality...

In addition to my prior comments, you could have looked up the source of
the subjectively errant assertion via its source named graph:
http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2
. Or you could have just followed the link:
http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fsw.opencyc.org%2F2008%2F06%2F10%2Fconcept%2FMx4rvWuBAJwpEbGdrcN5Y29ycA
. Either way, you would come to realize:

1. The DBMS has many Named Graphs
2. The browser page in question scopes queries to all graphs
3. Nothing about this setup enforces owl:sameAs inference -- the reason
why you have other links showing application of owl:sameAs reasoning to
the data in question.

As I've told you repeatedly, we have Named Rules and Named Graphs. In
our world these parts are all loosely coupled so that humans and agents
can pursue their desired world views. I am not trying to enforce
anything on anyone via our technology. Basically, this is about showing
the virtues of loosely coupling critical parts of this Linked Data
ecosystem.

BTW - we are already working with Yago2, ProductOntology, OpenCyc re.
fixes to their DBpedia mappings. All part of a virtuous cycle driven by
conversations about the data with subjective enhancements via context
lenses as the final destination.

To concluded, finding the subjectively bad needle in the haystack is in
of itself immensely valuable with regards any pursuit of subjective data
quality. You can fix what you don't know is broken. LOD is a large
community ditto DBpedia, nobody (as far as I know) has ever espoused the
position that data quality is a no-go area. What I think people do
espouse (I might be wrong) covertly is this: make your contribution
rather that berate those already making contributions, however perfect
or imperfect these contributions might be.

Regards,

Kingsley Idehen
President CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: Wordnet Planets SPARQL Puzzle

2011-04-12 Thread glenn mcdonald


  BIND(URI(CONCAT(http://dbpedia.org/resource/;, ?label)) AS ?dbpResource)

The 1.0/1.1 clunkiness is just temporary, but I feel obliged to point out
the hand-waving in this join-via-URI-concatenation...

Re: Wordnet Planets SPARQL Puzzle

2011-04-12 Thread Kingsley Idehen


On 4/12/11 10:03 AM, Rob Vesse wrote:


Hi Glenn

Interjecting into your email thread re Danny's SPARQL puzzle in case 
you hadn't seen my tweets to him today on this topic


On Tue, 12 Apr 2011 09:33:05 -0400, glenn mcdonald gl...@furia.com 
wrote:


Which is what I was talking about in Boundedness: does the data
have the properties you need to extract the subset you want. E.g.,
Danny Ayers yesterday was trying to make a SPARQL query for
Wordnet that found the planets in the solar system that aren't
named after Roman gods. But neither he nor I could find any way in
the data to distinguish actual planets in the list of solar
bodies, so we couldn't quite make it right. That was a data
problem, not a tool problem. But the difficulty of figuring this
out, /using/ the tools, was a tool problem.

Here is a query that answers Danny's question (also online at 
http://pastebin.com/8juVLmCT).  You'll need a SPARQL 1.1 engine to run 
this, if you don't have a local one to hand (or it doesn't support all 
the features I've used in the query since some are only in the editors 
drafts currently) then you can run this online at 
http://www.dotnetrdf.org/demos/leviathan/


AFAIK this should also run under Jena's ARQ (you may need the latest 
snapshot) and should be runnable on sparql.org except that the site 
appears to be down at the moment


The query is a tad clunky because the RKB Explorer endpoints are 
SPARQL 1.0 only so has to be split into several sections because the 
local SPARQL engine has to do the MINUS bit.  Once it has done the 
Wordnet bit it constructs a DBPedia Resource URI and then uses an 
EXISTS filter over a SERVICE call to DBPedia to ensure that the 
resource is a Planet


PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
PREFIX wn: http://www.w3.org/2006/03/wn/wn20/schema/

SELECT DISTINCT ?label WHERE
{
 SERVICE http://wordnet.rkbexplorer.com/sparql/
 {
   ?s1 wn:memberMeronymOf 
http://wordnet.rkbexplorer.com/id/synset-solar_system-noun-1 .

   ?s1 rdfs:label ?label.
 }
 MINUS
 {
  SERVICE http://wordnet.rkbexplorer.com/sparql/
  {
?s2 wn:hyponymOf 
http://wordnet.rkbexplorer.com/id/synset-Roman_deity-noun-1 .

?s2 rdfs:label ?label.
  }
 }
 BIND(URI(CONCAT(http://dbpedia.org/resource/;, ?label)) AS ?dbpResource)
 FILTER(EXISTS
 {
   SERVICE http://dbpedia.org/sparql
   {
 ?dbpResource a http://dbpedia.org/ontology/Planet .
   }
 })
}

Regards,

Rob Vesse

--
PhD Student
IAM Group
Bay 20, Room 4027, Building 32
Electronics  Computer Science
University of Southampton


Rob,

Nice!

And you change that into a SPARQL construct query and you have a 
linkeset that can be contributed back to DBpedia :-)



--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen


On 4/12/11 10:08 AM, glenn mcdonald wrote:


But who ever told you, or inferred to you, that any LOD demo is
about the Complete Linked Data Experience let alone the
Complete Data Experience.


I didn't capitalize those. A human's experience of data is the product 
of the underlying data and the tool/experience/interface through which 
they see it.


Via their own inherently subjective context lenses.



Excel the pathetic dominates the world of spreadsheets. Nuff said.


And yet, you don't seem to have dissolved your company, therefore you 
don't actually think Excel is the end of all conversations.


Don't get your point.

We build data access, integration, and management technology. All 
spreadsheets are interesting to use as consumers and presenters of data. 
That's it.


On your part, you claim Excel is pathetic. My question to you is: what's 
your alternative? How come it hasn't exploited the massive opportunity 
at hand? Bottom, your subjective comments about Excel or any other 
product are unwarranted.


Look, can't we just have a civil debate? Disagreements and debates are 
healthy in any realm.




Did write an alternative? Why isn't the world using your
alternative if such a thing exists. Bearing in mind the huge
market share of Excel why are you overlooking the massive
opportunity to cleanup via your superior product?


I wasn't making any claims about my project in this thread. But Needle 
and Google Refine are two examples of attempts to do data-management 
tools with more of a focus on cleanup and curation.


Google Refine != Excel. That isn't why Excel exists. This is one of 
those context infidelity examples again. My reference to Excel was 
about separating an application that can consume data from the 
technology that delivers data to it, and the actual originating sources 
of said data. Your response was to denigrate Excel, rather that attempt 
to grasp my point.



What is a Data Tool? Again, 100% subjective.


I don't think I know what you mean by the word subjective.


Clearly not. And maybe therein lines the problem. Subjective implies 
your world view. Example: you see Google Refine vs Excel as an Apples 
vs Apples comparison re. Data Reconciliation matters.





E.g., Danny Ayers yesterday was trying to make a SPARQL query for
Wordnet that found the planets in the solar system that aren't
named after Roman gods. But neither he nor I could find any way
in the data to distinguish actual planets in the list of solar
bodies, so we couldn't quite make it right.


And did you post a callout here or on Twitter or anyone else for
other folks to chime in?


Yes, Danny asked the question on Twitter and on his blog. I saw it and 
answered it. Nobody else chimed in.


Twitter link? I know Rob Vesse has already chimed in with a suggestion, 
but the call-out link is still interesting :-)



--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: Wordnet Planets SPARQL Puzzle

2011-04-12 Thread Kingsley Idehen


On 4/12/11 10:19 AM, glenn mcdonald wrote:


 BIND(URI(CONCAT(http://dbpedia.org/resource/;, ?label)) AS
?dbpResource)

The 1.0/1.1 clunkiness is just temporary, but I feel obliged to point 
out the hand-waving in this join-via-URI-concatenation...


What now? You don't like the manner in which a solution has been 
constructed? What are you looking for here?


--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen

On 4/12/11 10:30 AM, glenn mcdonald wrote:

In addition to my prior comments, you could have looked up the
source of the subjectively errant assertion

So you call Michael Jackson owl:sameAs Michael Rodrick a
subjectively errant assertion? I definitely don't know what you mean
by subjective.

via its source named graph:

http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2

http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2
. Or you could have just followed the link:

http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fsw.opencyc.org%2F2008%2F06%2F10%2Fconcept%2FMx4rvWuBAJwpEbGdrcN5Y29ycA
.

I can't see how to tell from either link where the sameAs assertion
connecting Jackson to Rodrick came from. Can you show me how to
discern the provenance of that particular triple?

http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2
http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2
. That's how you discern its from OpenCyc since each datasets is loaded
into its now Named Graph.

Even easier: follow the link, the copy the value of @href from About:
XYZ.. or just click on the About: XYZ hyperlink and you'll find
yourself in the OpenCyc data space :-)

Regards,

Kingsley Idehen
President CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen


On 4/12/11 10:37 AM, glenn mcdonald wrote:


On your part, you claim Excel is pathetic.


No, I said that it's pathetic that Excel doesn't offer better tools 
for evaluating and improving data.


Excel has always been extensible. You or anyone else can extend it. 
Thus, how can it be pathetic that Excel doesn't offer this feature when 
its extremely extensible? The feature in question isn't core 
functionality in the eyes of Excel product developers.




Bottom, your subjective comments about Excel or any other product
are unwarranted.

Look, can't we just have a civil debate? Disagreements and debates
are healthy in any realm.


Not sure what to do with this pair of statements.

Example: you see Google Refine vs Excel as an Apples vs Apples
comparison re. Data Reconciliation matters.


I said no such thing. I brought up Google Refine precisely because 
it's a different sort of thing than Excel.


You brought it up in the context of Excel i.e., in response to the 
thread developing around you utterances that comprised of the patterns 
Excel and Pathetic. You are basically quibbling about Excel not 
being capable of the functionality delivered by Google Refine or taking 
the position that its pathetic that Excel lacks such functionality. You 
quibble about an inaccurate assertion between DBpedia and OpenCyc re. 
owl:sameAs. What point are you trying to make re. Pathetic and Excel 
with regards to Google Refine ?





--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: Wordnet Planets SPARQL Puzzle

2011-04-12 Thread glenn mcdonald


  BIND(URI(CONCAT(http://dbpedia.org/resource/;, ?label)) AS ?dbpResource)

 The 1.0/1.1 clunkiness is just temporary, but I feel obliged to point out
 the hand-waving in this join-via-URI-concatenation...

 What now? You don't like the manner in which a solution has been
 constructed? What are you looking for here?


I really think you can figure out for yourself what's not so great about
this solution. But to go ahead and state the obvious, this is
concatenating wordnet's rdfs:label for these planets directly into a dbpedia
URI. This will only work if the identifiers happen to line up exactly. Which
they do in the case of these 8 (!) entities, but I wouldn't want to rely on
that tactic in general.

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen


On 4/12/11 10:59 AM, glenn mcdonald wrote:


If you can't see the data there's nothing to fix, thus we end up
in a subjective fools paradise.


Not sure who you're talking to here. I'm certainly not arguing against 
seeing the data.


You continue to imply that seeing subjectively imperfect data projected 
via a data oriented tool is problematic re., your total data 
experience world view.



--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: Wordnet Planets SPARQL Puzzle

2011-04-12 Thread glenn mcdonald


 Stop quibbling, contribute a solution.


As you know, but others might not, I work on www.needlebase.com, a
graph-database project incubated at ITA and due to become part of Google any
hour now. It takes a somewhat different approach to data representation and
data curation than the RDF/OWL/SPARQL stack. It's free for personal uses and
has free trials for commercial uses, so anybody is welcome to find out
whether it's suited for their particular problems.

Fwd: Early Bird Registration - Second Annual VIVO Conference

2011-04-12 Thread Tim Berners-Lee

Would it be great to have some folks from the LOD community at this
conference to make sure that VIVO interfaces well in with the rest
of the LOD cloud?

Tim

Begin forwarded message:

 From: VIVO alici...@ufl.edu
 Date: 2011-04-12 8:35:47 EDT
 To: timbl+v...@w3.org
 Subject: Early Bird Registration - Second Annual VIVO Conference
 Reply-To: alici...@ufl.edu

 Having trouble viewing this email? Click here

 Register

 Quick Links
 vivoweb.org/conference
 Gaylord National
 Preliminary Agenda
 2010 VIVO Conference
 National Harbor
 Transportation
 DC Area Map

 Contact Us
 We welcome your questions and comments.  Email us:
 VIVO Conference 
 Second Annual VIVO Conference
 August 24-26, 2011
 Gaylord National, Washington D.C.

 More Information
 Early Bird Registration
 Call for Papers
 Call for Apps
 Workshops
 Sponsors
 Explore Gaylord National
 About VIVO
 Early Bird Registration
 Registration is now open for the Second Annual VIVO Conference.

 The $350 Early Bird registration rate is available until May 27.

 You are welcome to register online or by fax/mail.  Gaylord National is 
 offering a $179 discounted room rate for VIVO conference attendees.

 This discount room rate is only available until July 24.  Click here to 
 reserve your room:  VIVO guests at Gaylord National
 Official Call for Papers

 We are pleased to invite you to participate in this year's VIVO conference 
 with contributions to the meeting.

 We request papers, panels and poster presentations which focus on issues that 
 VIVO is trying to address.  Abstracts are due June 1.

 Topics of interest:  Collaboration, Semantic Web, Linked Open Data, Role of 
 VIVO in Science, Adoption of VIVO, Ontologies Implementation of VIVO, Crowd 
 Sourcing, Mapping  Networks, Research Discovery, Research Networking, VIVO 
 Development, Using VIVO data

 Submission:  All submissions are handled electronically at EasyChair.  For 
 information on submission requirements, refer to the Official Call for Papers.
 Official Call for Apps

 The conference is sponsoring a competition for applications using VIVO data 
 to support science.  Entries are due July 31.

 Refer to the Call for National Networking Applications for submission 
 information, including eligibility, evaluation criteria and prizes.

 Workshops 

 The three-day conference begins with a full day of workshops on August 24.

 We are pleased to offer six
 half-day workshops at this year's conference.

 The workshops are designed for those new to VIVO, those implementing VIVO and 
 those wishing to develop applications using VIVO.

 Each workshop is designed as a stand alone session that can be mixed and 
 matched depending on your interest in VIVO or your role within a VIVO 
 implementation.

 Morning Workhops (August 24)
 Part I:  Introduction to Development on the Open Source VIVO Project
 Introduction to Implementation
 Visualization in VIVO:  A Case Study in How VIVO Data and Technology Can Be 
 Used

 Afternoon Workshops (August 24)
 Part II:  Advanced Development on the Open Source VIVO Project
 Creating your Marketing  Outreach Plan
 Data acquisition for VIVO:  Extended Ingest by Example
 Sponsors
 Silver, Platinum and Gold Sponsor Packages are available for anyone 
 interested in supporting the Second Annual VIVO Conference.

 In addition to Sponsor Packages, there are several Exclusive Opportunities 
 for marketing and promotion at this year's VIVO conference.

 For more information, contact Sponsorship Manager, Alan Frankle at Designing 
 Events +1-443-213-1950 or refer to the Sponsor Prospectus.
 Explore Gaylord National

 Experience Gaylord National

 Gaylord National Virtual Tour

 Relâche Spa, Fitness Center and Pool 

 Kid's Activities

 Shopping

 Restaurants, Bars  Lounge

 Pose Ultra Lounge (home of this year's poster session!)
 About VIVO
 VIVO is an open source, open ontology, open process platform for hosting 
 information about scientists and their interests, activities and 
 accomplishments.

 VIVO supports open development and facilitates integration of science through 
 simple, standard semantic web technologies.  VIVO: Enabling National Network 
 of Scientists is supported by NIH Award U24 RR029822.

 Learn more at vivoweb.org

 Forward email

 This email was sent to timbl+v...@w3.org by alici...@ufl.edu |  
 Update Profile/Email Address | Instant removal with SafeUnsubscribe™ | 
 Privacy Policy.
 VIVO | 1600 SW Archer Road | PO Box 100219 | Gainesville | FL | 32610

Re: Wordnet Planets SPARQL Puzzle

2011-04-12 Thread Kingsley Idehen


On 4/12/11 11:16 AM, glenn mcdonald wrote:


Stop quibbling, contribute a solution.


As you know, but others might not, I work on www.needlebase.com 
http://www.needlebase.com, a graph-database project incubated at ITA 
and due to become part of Google any hour now. It takes a somewhat 
different approach to data representation and data curation than the 
RDF/OWL/SPARQL stack. It's free for personal uses and has free trials 
for commercial uses, so anybody is welcome to find out whether it's 
suited for their particular problems.


Post a link showing how it solves the problems you've gripped about 
without the data living in a silo. By this I mean, the data presentation 
pages and data sources should be loosely coupled. In addition, your Data 
Object Identifiers should resolve to Referent Representation 
(description graphs) via URLs. You do that and I'll retract my silo 
tag :-)


If you have a dataset fix for Danny's problems (or any others you've 
stumbled across along the way) do share via a URL.


--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: Wordnet Planets SPARQL Puzzle

2011-04-12 Thread glenn mcdonald


 If you have a dataset fix for Danny's problems (or any others you've
 stumbled across along the way) do share via a URL.


Well, the problems in Danny's case were these:

- the required query path to connect gods to planets was non-obvious and not
trivial to figure out by exploring
- doing negation in SPARQL 1.0 is clumsy
- the wordnet dataset lacked identification of actual planets

I solved the first problem by just poking around patiently. This kind of
thing is easier and faster in Needle because the Needle explorer UI is
configurable by the user, and can be extended by calculated fields. It might
be interesting to load the Wordnet data into Needle. I haven't done that
yet, and it's bigger than the limits on our free personal accounts, but if
anybody wants to try it, let me know and I'll see if we can set up an
account with higher limits for you.

Negation is definitely better in SPARQL 1.1 than 1.0, so the obvious
solution there is upgrading the server behind the wordnet dataset. The
query would be simpler in Thread, but that's a different topic.**

As for the actual-planet thing, what you really want there is some shared
identifiers. Rob's query used one dataset's strings as parts of another
dataset's identifiers, which is a hopeful approach. I see that dbpedia has
links to opencyc IDs, and wordnet has links to an alternate wordnet URI set
hosted at w3c.org, so maybe there's a link we could find by following those
two chains further.

Absent that, Needle's answer is to support human curation of the data, so
we'd pull in both sets, cluster them for you, and let you confirm or reject
the matches. I don't know what the administrative tools for the wordnet
dataset look like, but I think their RDF version is an export, not the
native form of the data, so there's no real comparison to be made there.


**For the interested, the single-domain SPARQL query was this:


PREFIX rdfs:  http://www.w3.org/2000/01/rdf-schema#
PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/
PREFIX id:http://wordnet.rkbexplorer.com/id/

SELECT DISTINCT ?planet WHERE {
  ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 .
  ?s1 rdfs:label ?planet .
  OPTIONAL {
?s1 wn:containsWordSense ?ws1 .
?ws1 wn:word ?w .
?ws2 wn:word ?w .
?s2 wn:containsWordSense ?ws2 .
?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 .
  }
  FILTER (!bound(?s2))
}

and in SPARQL 1.1 it could be simplified to (I think):


PREFIX rdfs:  http://www.w3.org/2000/01/rdf-schema#
PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/
PREFIX id:http://wordnet.rkbexplorer.com/id/

SELECT DISTINCT ?planet WHERE {
  ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 .
  ?s1 rdfs:label ?planet .
  MINUS {
?s1 wn:containsWordSense ?ws1 .
?ws1 wn:word ?w .
?ws2 wn:word ?w .
?s2 wn:containsWordSense ?ws2 .
?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 .
  }
}

where in Needle this same basic query idea might be done like this:

Synset:Solar System:!(.Hyponym.Sense.Word.Sense.Synset.Meronym:Roman Deity)

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread glenn mcdonald


 You continue to imply that seeing subjectively imperfect data projected via
 a data oriented tool is problematic re., your total data experience world
 view.


I continue to think it's hilarious that you consider it subjectively
imperfect that your dataset says Michael Jackson and Michael Rodrick are
the same person. What would constitute objectively imperfect to you?

So yes, I think you should feel a little embarrassed about broadcasting
links to a demo in which the very first piece of data one sees is obviously
wrong. You've got billions of entities in dbpedia, and the technology
doesn't care which one you pick, so surely you could pick one where the
errors aren't as prominent. The fact that you didn't, and don't seem to
care, sends a message about your attitude towards data.

Re: Wordnet Planets SPARQL Puzzle

2011-04-12 Thread Kingsley Idehen


On 4/12/11 1:39 PM, glenn mcdonald wrote:


If you have a dataset fix for Danny's problems (or any others
you've stumbled across along the way) do share via a URL.


Well, the problems in Danny's case were these:

- the required query path to connect gods to planets was non-obvious 
and not trivial to figure out by exploring

- doing negation in SPARQL 1.0 is clumsy
- the wordnet dataset lacked identification of actual planets

I solved the first problem by just poking around patiently. This kind 
of thing is easier and faster in Needle because the Needle explorer UI 
is configurable by the user, and can be extended by calculated fields. 
It might be interesting to load the Wordnet data into Needle. I 
haven't done that yet, and it's bigger than the limits on our free 
personal accounts, but if anybody wants to try it, let me know and 
I'll see if we can set up an account with higher limits for you.


Negation is definitely better in SPARQL 1.1 than 1.0, so the obvious 
solution there is upgrading the server behind the wordnet dataset. 
The query would be simpler in Thread, but that's a different topic.**


As for the actual-planet thing, what you really want there is some 
shared identifiers. Rob's query used one dataset's strings as parts of 
another dataset's identifiers, which is a hopeful approach. I see that 
dbpedia has links to opencyc IDs, and wordnet has links to an 
alternate wordnet URI set hosted at w3c.org http://w3c.org, so maybe 
there's a link we could find by following those two chains further.


Absent that, Needle's answer is to support human curation of the data, 
so we'd pull in both sets, cluster them for you, and let you confirm 
or reject the matches. I don't know what the administrative tools for 
the wordnet dataset look like, but I think their RDF version is an 
export, not the native form of the data, so there's no real comparison 
to be made there.



**For the interested, the single-domain SPARQL query was this:

PREFIX rdfs:http://www.w3.org/2000/01/rdf-schema#
PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/
PREFIX id:http://wordnet.rkbexplorer.com/id/

SELECT DISTINCT ?planet WHERE {
   ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 .
   ?s1 rdfs:label ?planet .
   OPTIONAL {
 ?s1 wn:containsWordSense ?ws1 .
 ?ws1 wn:word ?w .
 ?ws2 wn:word ?w .
 ?s2 wn:containsWordSense ?ws2 .
 ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 .
   }
   FILTER (!bound(?s2))
}
and in SPARQL 1.1 it could be simplified to (I think):
PREFIX rdfs:http://www.w3.org/2000/01/rdf-schema#
PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/
PREFIX id:http://wordnet.rkbexplorer.com/id/

SELECT DISTINCT ?planet WHERE {
   ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 .
   ?s1 rdfs:label ?planet .
   MINUS {
 ?s1 wn:containsWordSense ?ws1 .
 ?ws1 wn:word ?w .
 ?ws2 wn:word ?w .
 ?s2 wn:containsWordSense ?ws2 .
 ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 .
   }
}
where in Needle this same basic query idea might be done like this:
Synset:Solar System:!(.Hyponym.Sense.Word.Sense.Synset.Meronym:Roman 
Deity)




Glenn,

Great!

We've achieved something here. You've shared your solution to a problem :-)

Important note to others:

Glenn and I aren't strangers, we've had these debates (sometimes heated) 
repeatedly in the past. The bridge I seek to cross with Glenn simply 
boils down to encouraging more of what he's done here (actual thread and 
this particular post) i.e., spot a problem and provide a solution that's 
ultimately a contribution to the general pool. That (IMHO) is 
exponentially better than shooting down the efforts of others at first 
blush - intentionally or inadvertently.



Glenn: I am 100% in agreement with human curration I just refer to it 
as conversation about the data that becomes part of the data. Basically, 
doing today's Wikipedia dance as part of the provenance aspect of a 
given data space. In a different thread it why I said: we ultimately 
want to be able to better discern the why dimension of a who, 
what, when, and where better than we can today, we'll never figure 
out why 100% but  0% is valuable in of itself etc..


The subjectivity inherent in data quality is why we ultimately have to 
discuss our way to the construction of context lenses. All of this can 
happen in Linked Data form. No need for any Data Silos. Named Graphs, 
Named Rules, and the ability to calibrate context via combination of 
reasoning and inference rules are integral components of the Linked Data 
mission, at least that what I see via my subjective context lenses  :-)


Links:

1. http://lod.openlinksw.com/c/CV5SCWN -- your SPARQL query
2. http://lod.openlinksw.com/c/CYOT3KC -- SPARQL 1.1 variant
3. http://lod.openlinksw.com/c/CYGCJVN - DESCRIBE (using this via raw 
/sparql endpoint will produce a graph in format of your choice).


We also have a linkset in the making that would simplify this quest next 
time around.


That's what I call

Call For Posters and Demos: Making Sense of Microposts (#MSM2011)

2011-04-12 Thread Milan Stankovic

***
CALL FOR POSTERS, DEMOS AND LATE-BREAKING WORK
***
1st Workshop on Making Sense of Microposts #MSM2011
7th Extended (former European) Semantic Web Conference ESWC 2010
30 May 2011
Heraklion, Greece
***
http://research.hypios.com/msm2011/
***

The topic of Making Sense of Microposts generated a lot of interest in the
Semantic Web research community, confirmed by 19 high quality paper
submissions, out of which 9 were accepted. We are now opening a second call
for a poster and demo track for presenting ideas, late-breaking results,
ongoing research projects, and speculative or innovative work in progress.
Posters and Demos are intended to provide authors and participants with the
ability to connect with one another and to engage in discussions about the
work. The call is intended for presentations of both work in research and
industry.

Authors are invited to submit a 2-page paper (PDF, Springer LNCS style [1])
with a separate abstract (up to 150 words). The paper must clearly
demonstrate relevance to the #MSM2011 topics. Decisions about acceptance
will be based on relevance to the Semantic Web area, originality, potential
significance, topicality and clarity.

The accepted posters and demos will be presented in the coffee breaks during
the Workshop, thus giving the poster authors the opportunity to interact
with other participants and obtain feedback on their work. The poster
abstracts will be available on the Workshop website, but will not be
included in the official proceedings.

An award for the most innovative poster/demo will be given by the workshop
chairs.

For more information about the Springer's Lecture Notes in Computer Science
(LNCS) please visit:
[1] http://www.springer.com/computer/lncs?SGWID=0-164-7-72376-0

*Topics of Interest*
We encourage submissions from, but not limited to, the following topics of
interest:

Microposts and Semantic Web technologies


   - Knowledge Discovery and Information Extraction
   - Factual Inference
   - Ontology/vocabulary modelling and learning from Microposts
   - Integrating Microposts into the Web of Linked Data


Social/Web Science studies


   - Analysis of Micropost data patterns
   - Motivations for creating and consuming Microposts
   - Relevance of Microposts and factors that influence them
   - Community/network analysis of Micropost dynamics
   - Ethics/privacy implications of publishing and consuming Microposts


Context


   - Utilising context (time, location, feeling)
   - Contextual inference mechanisms
   - Social awareness streams and Online Presence
   - Event Detection


Applying Microposts


   - User profiling/recommendation/personalization approaches using
   Microposts
   - Public opinion mining
   - Trend prediction
   - Expertise finding
   - Business analysis/market scanning
   - Emergency systems
   - Urban sensing and location-based applications



*Important Dates*
Submission (Posters/Demos): May 1, 2011 (23:59 Hawaii time)
Notification (Posters/Demos): May 15, 2011
Camera Ready (Posters/Demos): May 20, 2011


*Submission*
Submission and reviewing of poster papers will be electronic, via the
MSM2011 EasyChair installation at:
https://www.easychair.org/conferences/?conf=msm2011

*General contact*
If you have any questions, please contact: msm.org...@gmail.com

*#MSM Chairs*
Matthew Rowe
Milan Stankovic
Aba-Sah Dadzie
Mariann Hardey

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen


On 4/12/11 1:52 PM, glenn mcdonald wrote:


You continue to imply that seeing subjectively imperfect data
projected via a data oriented tool is problematic re., your total
data experience world view.


I continue to think it's hilarious that you consider it subjectively 
imperfect that your dataset says Michael Jackson and Michael Rodrick 
are the same person. What would constitute objectively imperfect to you?


The problem is this: I isn't my dataset. It's data loaded into an 
instance of Virtuoso.




So yes, I think you should feel a little embarrassed about 
broadcasting links to a demo in which the very first piece of data one 
sees is obviously wrong.


To you the first piece of that is an owl:sameAs assertion. That's 100% 
fine for you, but that isn't true for everyone else. It just isn't.


You've got billions of entities in dbpedia, and the technology doesn't 
care which one you pick, so surely you could pick one where the errors 
aren't as prominent.


No, DBpedia doesn't have a billions of entities, that just one dataset. 
The Virtuoso instance in question is a LOD cloud cache instance i.e., 
we've loaded the available datasets into the instance. From that I 
produce a variety of demos. Just as anyone else can since the endpoints 
are all public.


The fact that you didn't, and don't seem to care, sends a message 
about your attitude towards data.


Again, context infidelity. In due course you will understand my point. 
For now, we can go back an forth. You characterization is 100% inaccurate.



--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread glenn mcdonald


 So yes, I think you should feel a little embarrassed about broadcasting
 links to a demo in which the very first piece of data one sees is obviously
 wrong.

 To you the first piece of that is an owl:sameAs assertion. That's 100% fine
 for you, but that isn't true for everyone else. It just isn't.


Why, is the page dynamically reconfigured for other people? I'm not saying
first in some mushy philosophical sense, I'm talking about the first
attribute that appears in the structured-data section of the page, right
under the headings Attributes and Values.

You've got billions of entities in dbpedia, and the technology doesn't care
 which one you pick, so surely you could pick one where the errors aren't as
 prominent.


 No, DBpedia doesn't have a billions of entities, that just one dataset.


What? Whatever: you've got plenty of other entities, so surely you could
pick one where the errors aren't as prominent. Here, for example, is the
next one I tried:

http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FTori_Amos

There are some dubious bits to this, too (she only composed one song?** a
person is subsequent work of a song?***), but at least this is a page
about a person that appears to be about a single person. Same technology,
better demo.

In due course you will understand my point.


Understood your points the first hundred times you stated them. Any time
you'd like to take a turn understanding mine, feel free.


 You characterization is 100% inaccurate.


In the context of your insistence on the subjectivity of everything, I
assume this is intended as a joke. Funnier without the typo.


**Completeness failure
***Modeling Correctness error

Discussion meta-comment

2011-04-12 Thread Hugh Glaser

A recent thread included discussion of how to reply to postings. 

For what it's worth, I don't agree that the best way to reply to a posting 
about doing something in one system is to say:
Well this is how I do it in my system.

At its best, it is hard to understand what the respondent means, because it 
entails (at least for the original poster who is looking for feedback on their 
system) working out what the respondent's system view is implicitly, using 
terms that the respondent finds comfortable, but are often alien to the poster.
At its worst, the original message is completely lost, as the thread simply 
moves to a discussion of the respondent's system.

It is far better if respondents try to communicate with the poster by 
addressing the post directly, using the poster's terms wherever they can.
And it should certainly be acceptable to give the poster feedback, including 
comments that may seem negative as well as positive, without having another 
implementation or solution in your pocket.

I, as well as others I know, find the culture that has developed on this list 
of responses saying Well this is how I do it alienating, and thus sometimes a 
barrier to posting and genuine responses, and so actually stifles discussion.

Happy to be told I am wrong, or in a tiny minority, without hearing any 
proposals for better solutions. :-)

Hugh

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread glenn mcdonald


 Nothing about the DBMS hosting the datasets (where each has a Named Graph
 IRI) prevents the beholder or consumer from achieving the following via the
 available data access endpoints:

 1. Accessing and altering the source query or SPARQL protocol URL


I tried clicking your OpenLink Data Explorer link to do this, and got a
page with broken graphics and a frozen loading.. indicator. Tried again
and got to a Data Explorer page that says 0 records (0 triples, 0
properties) match selected filters. Nothing to display. Perhaps your filters
are too restrictive? So I'd say something is preventing the beholder from
achieving this.

2. Adding or removing pragmas re. inference context (owl:sameAs expansion,
 invocation of fuzzy InverseFunctionalProperty rules, or combination of both)
 as part of the view alteration quest outlined above


I went to the Settings page to check this out, and found the owl:sameAs
toggle. Of course, it's unchecked, despite all those sameAs relationships
showing up, and when I check it they go away, so you've wired the setting
backwards. Nice job.


3. Viewing original or actual query results via alternative tools that can
 process HTTP response payloads -- remember nothing about SPARQL mandates RDF
 as sole query results format across SELECT, DESCRIBE, or CONSTRUCT queries

 4. Sharing new query, new result set, new data presentation etc.. via a URL
 as part of an evolving conversation about the data in question.


These are great. I support HTTP access, multiple formats, and
URL-addressable queries/results/views.

Remember, I do espouse to the mantra: Data is like Wine while Application
 code is like Fish. A Good (Cool) URL or URI should be able to stand the test
 of time :-)


 Catchy.

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen


On 4/12/11 3:02 PM, glenn mcdonald wrote:



So yes, I think you should feel a little embarrassed about
broadcasting links to a demo in which the very first piece of
data one sees is obviously wrong.

To you the first piece of that is an owl:sameAs assertion. That's
100% fine for you, but that isn't true for everyone else. It just
isn't.


Why, is the page dynamically reconfigured for other people?


As per my latest post. It's just a point of view. You are now talking 
about UI aesthetics rather than data quality. The presentation layer is 
just that, a presentation layer. The Data layer is just that, a Data Layer.


I'm not saying first in some mushy philosophical sense, I'm talking 
about the first attribute that appears in the structured-data section 
of the page, right under the headings Attributes and Values.
Because out of 21 Billion+ records why should the page order by 
perceived quality of assertion in an owl:sameAs relation? Why? Because 
it might bug you? Is there an inherent semantic in Links that infers:


1. Thou must click
2. Thou must click and infer
3. Thous must infer?

Moreover, the issue with OpenCyc links to and from DBpedia (not 
performed by me or anyone at OpenLink Software)  is something that is 
going to be resolved when OpenCyc release a new linkset.


There's absolutely nothing wrong with a page that immediately brings to 
attention misuse or dangerous use of owl:sameAs. You (as a cognitively 
endowed being) see the page on one context, that fine. But others will 
also look at the same page and see things differently. This is the very 
basis of cognition. We are wired to see things differently. IMHO a 
clever feature inherited from our universe. Imagine if we could only 
observe the same limited dimensions of an observation subject?


The presentation is the page != a position about how I feel about data 
quality. It's is just a presentation of data that's loosely coupled to 
its data sources. You can even take the source code of the page and 
tweak it for your specific needs if you like. That's what this is 
supposed to be about.


I could start to understand your view point if my presentation, data 
sources etc.. where imposed on you etc. That simply isn't the case, and 
that's 100% antithetical to the concept of Linked Data that I am 
particularly excited about i.e., the loose coupling knowledge, 
information, and data that inherently facilitates free remixing and 
sharing of: data sources, queries, and presentation pages.






You've got billions of entities in dbpedia, and the technology
doesn't care which one you pick, so surely you could pick one
where the errors aren't as prominent.


No, DBpedia doesn't have a billions of entities, that just one
dataset.


What? Whatever: you've got plenty of other entities, so surely you 
could pick one where the errors aren't as prominent. Here, for 
example, is the next one I tried:


http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FTori_Amos


Again, I pick examples like 'Micheal Jackson' because like 'New York', 
'Paris' etc., my focal point is/was: use of entity type and other 
attributes as mechanism for disambiguating my quests for information 
about a specific entity, at massive scales. The aforementioned entity 
examples ultimately accentuate the challenge at hand.


I won't drop triples in the OpenCyc Named Graph simply because of a few 
questionable relations potentially upsetting a few observers. I am more 
interested in real demos, and that means bad or questionable data warts 
are part of the package. Exercises like this have triggered many a 
dataset fix in LOD land. You'd be quite surprised (bearing in mind your 
perception of my data quality values) chow many dataset producers I've 
worked with re. data fixes across the ABox and TBox realms.




There are some dubious bits to this, too (she only composed one 
song?** a person is subsequent work of a song?***), but at least 
this is a page about a person that appears to be about a single 
person. Same technology, better demo.


No, your demo of the same technology. That's a better characterization. 
Again, the inherent tone of your commentary continues to echo a 
contentious problem: you can always speak for yourself, just done speak 
for me. We are individuals (in a ! owl:sameAs relation).




In due course you will understand my point.


Understood your points the first hundred times you stated them. Any 
time you'd like to take a turn understanding mine, feel free.


Open the door first i.e., stop telling me about myself.

We can have a conversation, we've had many in the past. All you have to 
do is open the door.



You characterization is 100% inaccurate.


In the context of your insistence on the subjectivity of everything, I 
assume this is intended as a joke. Funnier without the typo.



**Completeness failure
***Modeling Correctness error



Yes, LOL re. typo too.

Here's a

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen


On 4/12/11 3:25 PM, glenn mcdonald wrote:


Nothing about the DBMS hosting the datasets (where each has a
Named Graph IRI) prevents the beholder or consumer from achieving
the following via the available data access endpoints:

1. Accessing and altering the source query or SPARQL protocol URL


I tried clicking your OpenLink Data Explorer link to do this, and 
got a page with broken graphics and a frozen loading.. indicator. 
Tried again and got to a Data Explorer page that says 0 records (0 
triples, 0 properties) match selected filters. Nothing to display. 
Perhaps your filters are too restrictive? So I'd say something is 
preventing the beholder from achieving this.


Please post the URL in question so I can double check what's happening. 
Remember, I am sharing URLs across the Web, there are many factor in 
play re. time variant nature of resources. etc..


Anyway, give me a URL and I can look into what might be happening.



2. Adding or removing pragmas re. inference context (owl:sameAs
expansion, invocation of fuzzy InverseFunctionalProperty rules, or
combination of both) as part of the view alteration quest outlined
above


I went to the Settings page to check this out, and found the 
owl:sameAs toggle. Of course, it's unchecked, despite all those 
sameAs relationships showing up, and when I check it they go away, so 
you've wired the setting backwards. Nice job.


To you, I've wired the setting backwards i.e., I opted not to impose the 
overhead of owl:sameAs union expansion by default. Overhead in this case 
also includes what's ultimately your prime gripe: an unrepresentative 
graph since the union is comprised of attribute=value pairs from 
individuals that aren't the same.


Methinks, the defaults are fine. Worst that happens (without addition 
overhead) is you click a value exposed via a broken owl:sameAs relation. 
The system doesn't reason unless you ask it to do so explicitly.


Your world view != mine. Thus, don't try to impose *your* information 
expectations on *my* information projections. You can always make a 
different view. That's why loosely coupling information and data is vital.





3. Viewing original or actual query results via alternative tools
that can process HTTP response payloads -- remember nothing about
SPARQL mandates RDF as sole query results format across SELECT,
DESCRIBE, or CONSTRUCT queries

4. Sharing new query, new result set, new data presentation etc..
via a URL as part of an evolving conversation about the data in
question.


These are great. I support HTTP access, multiple formats, and 
URL-addressable queries/results/views.


But you have a silo. The day you deliver Objects with IDs that resolve 
to their Representations via URLs is the day I'll drop the silo tag 
re. your data space :-)




Remember, I do espouse to the mantra: Data is like Wine while
Application code is like Fish. A Good (Cool) URL or URI should be
able to stand the test of time :-)


 Catchy.


Yes, catchy cos it will catch on, courtesy of the burgeoning Web of 
Linked Data.



--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread glenn mcdonald


 Please post the URL in question so I can double check what's happening.
 Remember, I am sharing URLs across the Web, there are many factor in play
 re. time variant nature of resources. etc..

 Anyway, give me a URL and I can look into what might be happening.


http://linkeddata.uriburner.com/ode/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson

To you, I've wired the setting backwards i.e., I opted not to impose the
 overhead of owl:sameAs union expansion by default.


No, this is not a to you thing. The checkbox is off, but the sameAs
expansions *are* showing. I'm not arguing a philosophical point, I'm
observing that you have a UI bug.

These are great. I support HTTP access, multiple formats, and
 URL-addressable queries/results/views.


 But you have a silo. The day you deliver Objects with IDs that resolve to
 their Representations via URLs is the day I'll drop the silo tag re. your
 data space :-)


I wasn't even talking about Needle, but that day came long ago. All Needle
nodes have IDs that resolve to representations via URLs.

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen


On 4/12/11 3:55 PM, glenn mcdonald wrote:


Please post the URL in question so I can double check what's
happening. Remember, I am sharing URLs across the Web, there are
many factor in play re. time variant nature of resources. etc..

Anyway, give me a URL and I can look into what might be happening.


http://linkeddata.uriburner.com/ode/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson

To you, I've wired the setting backwards i.e., I opted not to
impose the overhead of owl:sameAs union expansion by default.


No, this is not a to you thing. The checkbox is off, but the sameAs 
expansions *are* showing. I'm not arguing a philosophical point, I'm 
observing that you have a UI bug.


The link above doesn't correspond to any link I've sent to you 
owl:sameAs inference context. Basically, that's ODE one of many browsers 
we offer. Its forte isn't showcasing owl:sameAs expansion.


Here are the links I sent earlier:


1. 
http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson 
-- basic description of 'Micheal Jackson' from DBpedia


2. 
http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson 
-- list of source named graphs in the host DBMS


3. 
http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2 
-- list of named graphs with triples that reference this subject


4. 
http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=3 
-- explicit owl:sameAs relations across the entire DBMS (clicking on 
each Identifier will unveil the description graph for the Referent of 
said Identifier)


5. 
http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=4 
-- use of an InverseFunctionalProperty based rule to generate a fuzzy 
list of Identifiers that potentially share the same Referent (click on 
each link as per prior step)


6. 
http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksonsas=yes 
-- inference context enhanced description of 'Micheal Jackson' (this is 
a union expansion of all properties across all Identifiers in an 
owl:sameAs relation with DBpedia Entity, hence use of paging re. 
handling result set size.)


7. 
http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksonsas=yesp=6lp=7op=4prev=gp=6  
- Page 5 of 8 re. enhanced description of 'Micheal Jackson' .


I also sent the following links in response to your SPARQL solution to 
Danny's puzzle:



1. http://lod.openlinksw.com/c/CV5SCWN -- your SPARQL query
2. http://lod.openlinksw.com/c/CYOT3KC -- SPARQL 1.1 variant
3. http://lod.openlinksw.com/c/CYGCJVN - DESCRIBE (using this via raw 
/sparql endpoint will produce a graph in format of your choice).


Your queries:


**For the interested, the single-domain SPARQL query was this:

PREFIX rdfs:http://www.w3.org/2000/01/rdf-schema#
PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/
PREFIX id:http://wordnet.rkbexplorer.com/id/

SELECT DISTINCT ?planet WHERE {
  ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 .
  ?s1 rdfs:label ?planet .
  OPTIONAL {
?s1 wn:containsWordSense ?ws1 .
?ws1 wn:word ?w .
?ws2 wn:word ?w .
?s2 wn:containsWordSense ?ws2 .
?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 .
  }
  FILTER (!bound(?s2))
}

and in SPARQL 1.1 it could be simplified to (I think):

PREFIX rdfs:http://www.w3.org/2000/01/rdf-schema#
PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/
PREFIX id:http://wordnet.rkbexplorer.com/id/

SELECT DISTINCT ?planet WHERE {
  ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 .
  ?s1 rdfs:label ?planet .
  MINUS {
?s1 wn:containsWordSense ?ws1 .
?ws1 wn:word ?w .
?ws2 wn:word ?w .
?s2 wn:containsWordSense ?ws2 .
?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 .
  }
}






These are great. I support HTTP access, multiple formats, and
URL-addressable queries/results/views.


But you have a silo. The day you deliver Objects with IDs that
resolve to their Representations via URLs is the day I'll drop the
silo tag re. your data space :-)


I wasn't even talking about Needle, but that day came long ago. All 
Needle nodes have IDs that resolve to representations via URLs.


Okay, what where you talking about? Specificity helps everyone, this is 
a public forum etc..




--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: Discussion meta-comment

2011-04-12 Thread David. Huynh

+1 to your observation. And FWIW, I hesitated for 30 minutes literally before 
sending this message, deciding to say very little lest I get pulled into some 
philosophical debate myself :)

Sent from my iPhone

On Apr 12, 2011, at 12:10 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote:

 A recent thread included discussion of how to reply to postings. 
 
 For what it's worth, I don't agree that the best way to reply to a posting 
 about doing something in one system is to say:
 Well this is how I do it in my system.
 
 At its best, it is hard to understand what the respondent means, because it 
 entails (at least for the original poster who is looking for feedback on 
 their system) working out what the respondent's system view is implicitly, 
 using terms that the respondent finds comfortable, but are often alien to the 
 poster.
 At its worst, the original message is completely lost, as the thread simply 
 moves to a discussion of the respondent's system.
 
 It is far better if respondents try to communicate with the poster by 
 addressing the post directly, using the poster's terms wherever they can.
 And it should certainly be acceptable to give the poster feedback, including 
 comments that may seem negative as well as positive, without having another 
 implementation or solution in your pocket.
 
 I, as well as others I know, find the culture that has developed on this list 
 of responses saying Well this is how I do it alienating, and thus sometimes 
 a barrier to posting and genuine responses, and so actually stifles 
 discussion.
 
 Happy to be told I am wrong, or in a tiny minority, without hearing any 
 proposals for better solutions. :-)
 
 Hugh

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread glenn mcdonald

http://linkeddata.uriburner.com/ode/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson

The link above doesn't correspond to any link I've sent to you owl:sameAs
inference context. Basically, that's ODE one of many browsers we offer. Its
forte isn't showcasing owl:sameAs expansion.

Here are the links I sent earlier:

1.
http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson--
basic description of 'Micheal Jackson' from DBpedia

As I said already, go to this link and then click OpenLink Data Explorer
at the bottom, hoping, as the message promises, to Explore alternative
Linked Data Views Meshups. Is there another link somewhere to get to the
SPARQL query behind the page? I don't see one.

I wasn't even talking about Needle, but that day came long ago. All Needle
nodes have IDs that resolve to representations via URLs.

Okay, what where you talking about? Specificity helps everyone, this is a
public forum etc..

In Needle's Pazz Jop music-poll dataset, the (relative) ID for Michael
Jackson is 76337.

Here's a URI for the Michael Jackson node in Needle's Pazz Jop dataset:

https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337

Here's a URL for seeing that node in Needle's UI:

https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337typeId=9149585060559937608render=List

Here's a URL for getting that data in JSON:

I make no claims about the prettiness of these.

Re: Discussion meta-comment

2011-04-12 Thread Kingsley Idehen


On 4/12/11 4:33 PM, David. Huynh wrote:

  I, as well as others I know, find the culture that has developed on this list of 
responses saying Well this is how I do it alienating, and thus sometimes a 
barrier to posting and genuine responses, and so actually stifles discussion.

David/Hugh,

I get the point, but don't know the comment target, so I'll respond with 
regards to myself as one participant in today's extensive debate with Glenn.


I hope I haven't said or inferred this is how we/I do it without 
providing at the very least a link to what I am talking about etc?


More than anything else, I believe tractable discussion and debate is 
good. That's the most predictable route (I know)  to solving real 
problems etc..


--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread Kingsley Idehen

On 4/12/11 5:30 PM, glenn mcdonald wrote:

http://linkeddata.uriburner.com/ode/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson

The link above doesn't correspond to any link I've sent to you
owl:sameAs inference context. Basically, that's ODE one of many
browsers we offer. Its forte isn't showcasing owl:sameAs expansion.

Here are the links I sent earlier:

http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson--
basic description of 'Micheal Jackson' from DBpedia

As I said already, go to this link and then click OpenLink Data
Explorer at the bottom, hoping, as the message promises, to Explore
alternative Linked Data Views Meshups. Is there another link
somewhere to get to the SPARQL query behind the page? I don't see one.

Link:

1.
http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http%3A%2F%2Flod.openlinksw.com%2Fdescribe%2F%3Furi%3Dhttp%253A%252F%252Fdbpedia.org%252Fresource%252FMichael_Jackson+useragentheader=acceptheader=
-- URI Debugger output showing you what in link/ and/or what you can
extract via Link: response headers

As for the sole ODE option, that's kinda confusing as it isn't a
coherent segue in the grand scheme of things for the human user per se.
What I mean by that is this: we should also include the following links
in the footer:

1. iSPARQL link that places you in the QBE or Advanced Mode tabs of our
SPARQL Query Builder -- in this case you would see the DESCRIBE Query
rather than having to processes the encoded URLs in head/ link/ or
Link: response headers

2. PivotViewer links that places you either in the PivotViewer
description page of the SPARQL query editor

3. Raw /sparql endpoint page that like #1 alleviates tedium of decoding
the encoded SPARQL protocol URLs.

Then via 1-3 you end up with more human friendly routes to SPARQL behind
the page.

I wasn't even talking about Needle, but that day came long ago.
All Needle nodes have IDs that resolve to representations via URLs.

Okay, what where you talking about? Specificity helps everyone,
this is a public forum etc..

In Needle's Pazz Jop music-poll dataset, the (relative) ID for
Michael Jackson is 76337.

Here's a URI for the Michael Jackson node in Needle's Pazz Jop dataset:

https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337
https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337

Here's a URL for seeing that node in Needle's UI:

https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337typeId=9149585060559937608render=List
https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337typeId=9149585060559937608render=List

Here's a URL for getting that data in JSON:

https://pub.needlebase.com/actions/api/V2Visualizer.do?domain=Pazz-Joprender=Jsvthread=%4076337typeId=9149585060559937608render=ListshowAllDetails=falsedefaultCol=falseshowRejectedGroups=falseselectedColumns=ColuwCRkEpwkbzaLG6%2CColaNkcDjUUlijFYSv%2CColZ2NJlXSlrQKvP7W%2CColQcBNiWkLbAqCXcY%2CColeZBKuJQ41hg96q0%2CColMqmX4mWy1jw4dkP%2CCol3kQ0iFiMTlmghrr%2CCol6UVm8wlhDfgfNlYstartPage=1showCheckboxes=falsererun=false
https://pub.needlebase.com/actions/api/V2Visualizer.do?domain=Pazz-Joprender=Jsvthread=%4076337typeId=9149585060559937608render=ListshowAllDetails=falsedefaultCol=falseshowRejectedGroups=falseselectedColumns=ColuwCRkEpwkbzaLG6%2CColaNkcDjUUlijFYSv%2CColZ2NJlXSlrQKvP7W%2CColQcBNiWkLbAqCXcY%2CColeZBKuJQ41hg96q0%2CColMqmX4mWy1jw4dkP%2CCol3kQ0iFiMTlmghrr%2CCol6UVm8wlhDfgfNlYstartPage=1showCheckboxes=falsererun=false

I make no claims about the prettiness of these.

Not worried about the prettiness etc..

Are 'Michael Jackson' Object ID and Object Representation Access Address
distinct? I already know that an HTTP GET against the Representation
Address will return an EAV graph, once you loosen authentication
requirements re. JSON representation :-)

BTW - can I assume this is the actual URL that you intended above re.
access to JSON based graph representation:
https://pub.needlebase.com/actions/api/V2Visualizer.do?domain=Pazz-Joprender=Jsvthread=%4076337typeId=9149585060559937608render=ListshowAllDetails=falsedefaultCol=falseshowRejectedGroups=falseselectedColumns=ColuwCRkEpwkbzaLG6%2CColaNkcDjUUlijFYSv%2CColZ2NJlXSlrQKvP7W%2CColQcBNiWkLbAqCXcY%2CColeZBKuJQ41hg96q0%2CColMqmX4mWy1jw4dkP%2CCol3kQ0iFiMTlmghrr%2CCol6UVm8wlhDfgfNlYstartPage=1showCheckboxes=falsererun=false
?

Regards,

Kingsley Idehen
President CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: 15 Ways to Think About Data Quality (Just for a Start)

2011-04-12 Thread glenn mcdonald


 Are 'Michael Jackson' Object ID and Object Representation Access Address
 distinct?


Yes.

BTW - can I assume this is the actual URL that you intended above re. access
 to JSON based graph representation:
 https://pub.needlebase.com/actions/api/V2Visualizer.do?domain=Pazz-Joprender=Jsvthread=%4076337typeId=9149585060559937608render=ListshowAllDetails=falsedefaultCol=falseshowRejectedGroups=falseselectedColumns=ColuwCRkEpwkbzaLG6%2CColaNkcDjUUlijFYSv%2CColZ2NJlXSlrQKvP7W%2CColQcBNiWkLbAqCXcY%2CColeZBKuJQ41hg96q0%2CColMqmX4mWy1jw4dkP%2CCol3kQ0iFiMTlmghrr%2CCol6UVm8wlhDfgfNlYstartPage=1showCheckboxes=falsererun=false


You can assume it if you'd like. Or you can just try it and see...

37 matches

Mail list logo