Re: 15 Ways to Think About Data Quality (Just for a Start)
Glenn and all, greetings. On 2011 Apr 9, at 03:10, glenn mcdonald wrote: I don't think data quality is an amorphous, aesthetic, hopelessly subjective topic. Data beauty might be subjective, and the same data may have different applicability to different tasks, but there are a lot of obvious and straightforward ways of thinking about the quality of a dataset independent of the particular preferences of individual beholders. Here are just some of them: This is an excellent list. I think only a minority of these qualities could be scored precisely, but I think all of them could be scored on some awful-to-excellent scale, so that while they may not be quite objective metrics, they're at least clearly debatable. Complete objectivity is probably impossible here -- inevitable in a world where the concept of 'Rome' means significantly different things to the local authority, the ancient historian, and the tourist board. But 'solves my problem well' is a pretty good substitute. Best wishes, Norman -- Norman Gray : http://nxg.me.uk
Re: 15 Ways to Think About Data Quality (Just for a Start)
Hi Glenn, This reminds me some established frameworks. Here is a list of criteria gathered from the literature for metadata quality [1]. It is not exhaustive. Besiki Svitlia has also worked on a more comprehensive framework [2]. More has been done on information quality in general. However I guess they do not cover all aspects you mentioned, in particular, in relation to the ontology used and the linkage aspects for instance. *Completeness* In a complete metadata record, the learning object is described using all the fields that are relevant to describe it. * Accuracy*In an accurate metadata record, the data contained in the fields correspond to the object that is being described. * Provenance* The provenance parameter reflects the degree of trust that you have in the creator of the metadata record. *Conformance to expectations* This parameter measure how well the data contained in the record let you gain knowledge about the learning object without actually seeing the object *Logical consistency and coherence* This parameter reflects two measures: The consistency measures if the values chosen for different fields in the record agree between them. Coherence measures if all the fields talk about the same object *Timeliness*This parameter measure how up-to-date the metadata record is compared with changes in the object *Accessibility* This parameter measures how well you are able to understand the content of the metadata record Muriel Foulonneau [1] Thomas R. Bruce and Diane I. Hillman 'The Continuum of Metadata Quality' [2] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.8053rep=rep1type=pdf On Sat, Apr 9, 2011 at 3:10 AM, glenn mcdonald gmcdon...@furia.com wrote: I don't think data quality is an amorphous, aesthetic, hopelessly subjective topic. Data beauty might be subjective, and the same data may have different applicability to different tasks, but there are a lot of obvious and straightforward ways of thinking about the quality of a dataset independent of the particular preferences of individual beholders. Here are just some of them: 1. Accuracy: Are the individual nodes that refer to factual information factually and lexically correct. Like, is Chicago spelled Chigaco or does the dataset say its population is 2.7? 2. Intelligibility: Are there human-readable labels on things, so you can tell what a thing is when you're looking at? Is there a model, so you can tell what questions you can ask? If a thing has multiple labels (or a set of owl:sameAs things havemlutiple labels), do you know which (or if) one is canonical? 3. Referential correspondence: If a set of data points represents some set of real-world referents, is there one and only one point per referent? If you have 9,780 data points representing cities, but 5 of them are Chicago, Chicago, IL, Metro Chicago, Metropolitain Chicago, Illinois and Chicagoland, that's bad. 4. Completeness: Where you have data representing a clear finite set of referents, do you have them all? All the countries, all the states, all the NHL teams, etc? And if you have things related to these sets, are those projections complete? Populations of every country? Addresses of arenas of all the hockey teams? 5. Boundedness: Where you have data representing a clear finite set of referents, is it unpolluted by other things? E.g., can you get a list of current real countries, not mixed with former states or fictional empires or adminstrative subdivisions? 6. Typing: Do you really have properly typed nodes for things, or do you just have literals? The first president of the US was not George Washington^^xsd:string, it was a person whose name-renderings include George Washington. Your ability to ask questions will be constrained or crippled if your data doesn't know the difference. 7. Modeling correctness: Is the logical structure of the data properly represented? Graphs are relational databases without the crutch of rows; if you screw up the modeling, your queries will produce garbage. 8. Modeling granularity: Did you capture enough of the data to actually make use of it. :us :president :george_washington isn't exactly wrong, but it's pretty limiting. Model presidencies, with their dates, and you've got much more powerful data. 9. Connectedness: If you're bringing together datasets that used to be separate, are the join points represented properly. Is the US from your country list the same as (or owl:sameAs) the US from your list of presidencies and the US from your list of world cities and their populations? 10. Isomorphism: If you're bring together datasets that used to be separate, are their models reconciled? Does an album contain songs, or does it contain tracks which are publications of recordings of songs, or something else? If each data point answers this question differently, even simple-seeming queries may be intractable. 11. Currency: Is the data up-to-date? 12. Directionality: Can you
Re: 15 Ways to Think About Data Quality (Just for a Start)
On Fri, 2011-04-08 at 21:10 -0400, glenn mcdonald wrote: I don't think data quality is an amorphous, aesthetic, hopelessly subjective topic. Data beauty might be subjective, and the same data may have different applicability to different tasks, but there are a lot of obvious and straightforward ways of thinking about the quality of a dataset independent of the particular preferences of individual beholders. Here are just some of them: 1. Accuracy: Are the individual nodes that refer to factual information factually and lexically correct. Like, is Chicago spelled Chigaco or does the dataset say its population is 2.7? 2. Intelligibility: Are there human-readable labels on things, so you can tell what a thing is when you're looking at? Is there a model, so you can tell what questions you can ask? If a thing has multiple labels (or a set of owl:sameAs things havemlutiple labels), do you know which (or if) one is canonical? 3. Referential correspondence: If a set of data points represents some set of real-world referents, is there one and only one point per referent? If you have 9,780 data points representing cities, but 5 of them are Chicago, Chicago, IL, Metro Chicago, Metropolitain Chicago, Illinois and Chicagoland, that's bad. 4. Completeness: Where you have data representing a clear finite set of referents, do you have them all? All the countries, all the states, all the NHL teams, etc? And if you have things related to these sets, are those projections complete? Populations of every country? Addresses of arenas of all the hockey teams? 5. Boundedness: Where you have data representing a clear finite set of referents, is it unpolluted by other things? E.g., can you get a list of current real countries, not mixed with former states or fictional empires or adminstrative subdivisions? 6. Typing: Do you really have properly typed nodes for things, or do you just have literals? The first president of the US was not George Washington^^xsd:string, it was a person whose name-renderings include George Washington. Your ability to ask questions will be constrained or crippled if your data doesn't know the difference. 7. Modeling correctness: Is the logical structure of the data properly represented? Graphs are relational databases without the crutch of rows; if you screw up the modeling, your queries will produce garbage. 8. Modeling granularity: Did you capture enough of the data to actually make use of it. :us :president :george_washington isn't exactly wrong, but it's pretty limiting. Model presidencies, with their dates, and you've got much more powerful data. 9. Connectedness: If you're bringing together datasets that used to be separate, are the join points represented properly. Is the US from your country list the same as (or owl:sameAs) the US from your list of presidencies and the US from your list of world cities and their populations? 10. Isomorphism: If you're bring together datasets that used to be separate, are their models reconciled? Does an album contain songs, or does it contain tracks which are publications of recordings of songs, or something else? If each data point answers this question differently, even simple-seeming queries may be intractable. 11. Currency: Is the data up-to-date? 12. Directionality: Can you navigate the logical binary relationships in either direction? Can you get from a country to its presidencies to their presidents, or do you have to know to only ask about presidents' presidencies' countries? Or worse, do you have to ask every question in permutations of directions because some data asserts things one way and some asserts it only the other? 13. Attribution: If your data comes from multiple sources, or in multiple batches, can you tell which came from where? 14. History: If your data has been edited, can you tell how and by whom? 15. Internal consistency: Do the populations of your counties add up to the populations of your states? Do the substitutes going into your soccer matches balance the substitutes going out? That's a fantastic list and should be recorded on a wiki somewhere! A minor quibble, not sure about Directionality. You can follow an RDF link in both directions (at least in SPARQL and any RDF API I've worked with). I would be inclined to generalize and rephrase this as ... Consistency of modelling: whichever way you make modelling decisions such as direction of relations (from country to president, from president to country) it is done consistently so you don't have to ask many permutations of the same query. Possible additions: Licensed: the license under which the data can be used is clearly defined, ideally in a machine checkable way. Sustainable: there is some credible basis for believing the data will be maintained as current (e.g. backed by some appropriate organization or by a sufficiently large group of individuals, has
Re: 15 Ways to Think About Data Quality (Just for a Start)
As part of conversations about data, you do need to able to see the subjectively bad to make it subjectively good. What you can't do (which is what Glenn does repeatedly) is conflate the tools that actually enable you see the subjectively good, bad, or ugly with said data. I'm a tool developer with first hand experience, as you put it, too. I'm not conflating the tools and the data. But the complete data experience is the product of the tools and the data. Is Excel rendered useless because a list of countries with obvious errors was presented in the spreadsheet? To an audience of Spreadsheet developers (programmers making a Spreadsheet product) that's irrelevant That attitude is how Excel ended up with essentially no real data-cleaning tools, which is pathetic. The job of data tools is to mediate between people and computers, and thus helping people identify and understand and fix and improve data is just as much the tools' (and tool developers') responsibility as showing you a list of entity URIs. The list of data-quality metrics is also effectively a data-tool task list. this is why my demos are oriented towards enabling the beholder disambiguate his/her/its quest via filtering applied to entity types and other properties. Which is what I was talking about in Boundedness: does the data have the properties you need to extract the subset you want. E.g., Danny Ayers yesterday was trying to make a SPARQL query for Wordnet that found the planets in the solar system that aren't named after Roman gods. But neither he nor I could find any way in the data to distinguish actual planets in the list of solar bodies, so we couldn't quite make it right. That was a data problem, not a tool problem. But the difficulty of figuring this out, *using * the tools, was a tool problem. But of the 17 other qualities on my list + Dave's additions, at least 15 of them directly bear on the feasibility of using filtering to extract a good subset out of a flawed corpus.
Re: 15 Ways to Think About Data Quality (Just for a Start)
A minor quibble, not sure about Directionality. You can follow an RDF link in both directions (at least in SPARQL and any RDF API I've worked with). I would be inclined to generalize and rephrase this as ... Consistency of modelling: whichever way you make modelling decisions such as direction of relations (from country to president, from president to country) it is done consistently so you don't have to ask many permutations of the same query. Yes, inconsistency is the worst kind of directionality problem, but to me Directionality is still a problem in itself. An RDF browser that shows you both the incoming and outgoing triples is *addressing* that problem, as is anything that infers the inverses. But the problem still exists. It's an artificial skew between the logical properties of the data and the manifest properties in the system. An alternate data-modeling regime in which both directions are always explicitly asserted would not have this problem (but would take on, obviously, a higher burden on internal consistency as a result). I like your additions of Licensing, Sustainability and Authority.
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 9:33 AM, glenn mcdonald wrote: As part of conversations about data, you do need to able to see the subjectively bad to make it subjectively good. What you can't do (which is what Glenn does repeatedly) is conflate the tools that actually enable you see the subjectively good, bad, or ugly with said data. I'm a tool developer with first hand experience, as you put it, too. I'm not conflating the tools and the data. But the complete data experience is the product of the tools and the data. But who ever told you, or inferred to you, that any LOD demo is about the Complete Linked Data Experience let alone the Complete Data Experience. Who even knows, emphatically, what the so called Complete Data Experience actually is? That's as subjective a statement as I've every heard. Its the very line that continues to separate us. I might have my own perception of the aforementioned experience, but I have no business enforcing that on anyone else, its just my world view, end of story. Thus, I hold my position re. your subjective conflation of matters. When people publish demos of their products, they aren't publishing the demos for your world view they are publishing it from theirs, first. Of course, bearing in mind our similarities and disparities as cognitive beings there is varied potential for intersection of world views i.e., fusion. Naturally, fusion can occur with varying degrees of friction. Is Excel rendered useless because a list of countries with obvious errors was presented in the spreadsheet? To an audience of Spreadsheet developers (programmers making a Spreadsheet product) that's irrelevant That attitude is how Excel ended up with essentially no real data-cleaning tools, which is pathetic. And your comments once again reflect the issues I have with your commentary. Excel the pathetic dominates the world of spreadsheets. Nuff said. Did write an alternative? Why isn't the world using your alternative if such a thing exists. Bearing in mind the huge market share of Excel why are you overlooking the massive opportunity to cleanup via your superior product? The job of data tools is to mediate between people and computers, and thus helping people identify and understand and fix and improve data is just as much the tools' (and tool developers') responsibility as showing you a list of entity URIs. What is a Data Tool? Again, 100% subjective. Some people might think of Excel as a Data Tool others see it as something completely different. The list of data-quality metrics is also effectively a data-tool task list. this is why my demos are oriented towards enabling the beholder disambiguate his/her/its quest via filtering applied to entity types and other properties. Which is what I was talking about in Boundedness: does the data have the properties you need to extract the subset you want. E.g., Danny Ayers yesterday was trying to make a SPARQL query for Wordnet that found the planets in the solar system that aren't named after Roman gods. But neither he nor I could find any way in the data to distinguish actual planets in the list of solar bodies, so we couldn't quite make it right. And did you post a callout here or on Twitter or anyone else for other folks to chime in? That was a data problem, not a tool problem. But the difficulty of figuring this out, /using/ the tools, was a tool problem. But the tools (or your activity) unveiled a critical problem aligned to your specific goals. That's subjectively bad data laying foundation for subjectively improved data. All you need to do is open up a conversation that eventually results in a linkset that fixes the problem and delivers the context lenses you seek. This is a common and expected issue re. Linked Data at any scale, beyond your personal computer or personally curated data space. But of the 17 other qualities on my list + Dave's additions, at least 15 of them directly bear on the feasibility of using filtering to extract a good subset out of a flawed corpus. In my world: knowledge starts by discovering what you don't know. Same rule applies to data quality, you have to find the broken data before you can fix it. Do take issue with the mechanism that helps you find the broken data. Of course, take issue if there isn't a feedback loop or the loop is clogged with intransigence etc.. Neither is the case in the Linked Data realms of interest to me. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 9:53 AM, glenn mcdonald wrote: On Tue, Apr 12, 2011 at 8:58 AM, Kingsley Idehen kide...@openlinksw.com mailto:kide...@openlinksw.com wrote: 1. http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson -- basic description of 'Micheal Jackson' from DBpedia The very first assertion on this, your first link, is is sameAs of: Michael Rodrick. And you wonder why I keep distracting your technology demos by talking about data quality... Again, do you not understand the fundamental point? There is an inaccurate assertion in a relation in a give data space. How do you fix it if you can't see it in the first place? Subjectively bad data can lead to subjectively improved data. You take a single assertion from a 21 Billion+ data space, and decide that's the essence of the matter. Finding this assertion (needle in the 21 Billion+ haystack) is part of the point. Negating the errant named graph all together is another, post discovery. Not reasoning on owl:same assertion is yet another. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: 15 Ways to Think About Data Quality (Just for a Start)
But who ever told you, or inferred to you, that any LOD demo is about the Complete Linked Data Experience let alone the Complete Data Experience. I didn't capitalize those. A human's experience of data is the product of the underlying data and the tool/experience/interface through which they see it. Excel the pathetic dominates the world of spreadsheets. Nuff said. And yet, you don't seem to have dissolved your company, therefore you don't actually think Excel is the end of all conversations. Did write an alternative? Why isn't the world using your alternative if such a thing exists. Bearing in mind the huge market share of Excel why are you overlooking the massive opportunity to cleanup via your superior product? I wasn't making any claims about my project in this thread. But Needle and Google Refine are two examples of attempts to do data-management tools with more of a focus on cleanup and curation. What is a Data Tool? Again, 100% subjective. I don't think I know what you mean by the word subjective. E.g., Danny Ayers yesterday was trying to make a SPARQL query for Wordnet that found the planets in the solar system that aren't named after Roman gods. But neither he nor I could find any way in the data to distinguish actual planets in the list of solar bodies, so we couldn't quite make it right. And did you post a callout here or on Twitter or anyone else for other folks to chime in? Yes, Danny asked the question on Twitter and on his blog. I saw it and answered it. Nobody else chimed in.
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 9:53 AM, glenn mcdonald wrote: On Tue, Apr 12, 2011 at 8:58 AM, Kingsley Idehen kide...@openlinksw.com mailto:kide...@openlinksw.com wrote: 1. http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson -- basic description of 'Micheal Jackson' from DBpedia The very first assertion on this, your first link, is is sameAs of: Michael Rodrick. And you wonder why I keep distracting your technology demos by talking about data quality... In addition to my prior comments, you could have looked up the source of the subjectively errant assertion via its source named graph: http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2 . Or you could have just followed the link: http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fsw.opencyc.org%2F2008%2F06%2F10%2Fconcept%2FMx4rvWuBAJwpEbGdrcN5Y29ycA . Either way, you would come to realize: 1. The DBMS has many Named Graphs 2. The browser page in question scopes queries to all graphs 3. Nothing about this setup enforces owl:sameAs inference -- the reason why you have other links showing application of owl:sameAs reasoning to the data in question. As I've told you repeatedly, we have Named Rules and Named Graphs. In our world these parts are all loosely coupled so that humans and agents can pursue their desired world views. I am not trying to enforce anything on anyone via our technology. Basically, this is about showing the virtues of loosely coupling critical parts of this Linked Data ecosystem. BTW - we are already working with Yago2, ProductOntology, OpenCyc re. fixes to their DBpedia mappings. All part of a virtuous cycle driven by conversations about the data with subjective enhancements via context lenses as the final destination. To concluded, finding the subjectively bad needle in the haystack is in of itself immensely valuable with regards any pursuit of subjective data quality. You can fix what you don't know is broken. LOD is a large community ditto DBpedia, nobody (as far as I know) has ever espoused the position that data quality is a no-go area. What I think people do espouse (I might be wrong) covertly is this: make your contribution rather that berate those already making contributions, however perfect or imperfect these contributions might be. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Wordnet Planets SPARQL Puzzle
BIND(URI(CONCAT(http://dbpedia.org/resource/;, ?label)) AS ?dbpResource) The 1.0/1.1 clunkiness is just temporary, but I feel obliged to point out the hand-waving in this join-via-URI-concatenation...
Re: Wordnet Planets SPARQL Puzzle
On 4/12/11 10:03 AM, Rob Vesse wrote: Hi Glenn Interjecting into your email thread re Danny's SPARQL puzzle in case you hadn't seen my tweets to him today on this topic On Tue, 12 Apr 2011 09:33:05 -0400, glenn mcdonald gl...@furia.com wrote: Which is what I was talking about in Boundedness: does the data have the properties you need to extract the subset you want. E.g., Danny Ayers yesterday was trying to make a SPARQL query for Wordnet that found the planets in the solar system that aren't named after Roman gods. But neither he nor I could find any way in the data to distinguish actual planets in the list of solar bodies, so we couldn't quite make it right. That was a data problem, not a tool problem. But the difficulty of figuring this out, /using/ the tools, was a tool problem. Here is a query that answers Danny's question (also online at http://pastebin.com/8juVLmCT). You'll need a SPARQL 1.1 engine to run this, if you don't have a local one to hand (or it doesn't support all the features I've used in the query since some are only in the editors drafts currently) then you can run this online at http://www.dotnetrdf.org/demos/leviathan/ AFAIK this should also run under Jena's ARQ (you may need the latest snapshot) and should be runnable on sparql.org except that the site appears to be down at the moment The query is a tad clunky because the RKB Explorer endpoints are SPARQL 1.0 only so has to be split into several sections because the local SPARQL engine has to do the MINUS bit. Once it has done the Wordnet bit it constructs a DBPedia Resource URI and then uses an EXISTS filter over a SERVICE call to DBPedia to ensure that the resource is a Planet PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX wn: http://www.w3.org/2006/03/wn/wn20/schema/ SELECT DISTINCT ?label WHERE { SERVICE http://wordnet.rkbexplorer.com/sparql/ { ?s1 wn:memberMeronymOf http://wordnet.rkbexplorer.com/id/synset-solar_system-noun-1 . ?s1 rdfs:label ?label. } MINUS { SERVICE http://wordnet.rkbexplorer.com/sparql/ { ?s2 wn:hyponymOf http://wordnet.rkbexplorer.com/id/synset-Roman_deity-noun-1 . ?s2 rdfs:label ?label. } } BIND(URI(CONCAT(http://dbpedia.org/resource/;, ?label)) AS ?dbpResource) FILTER(EXISTS { SERVICE http://dbpedia.org/sparql { ?dbpResource a http://dbpedia.org/ontology/Planet . } }) } Regards, Rob Vesse -- PhD Student IAM Group Bay 20, Room 4027, Building 32 Electronics Computer Science University of Southampton Rob, Nice! And you change that into a SPARQL construct query and you have a linkeset that can be contributed back to DBpedia :-) -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 10:08 AM, glenn mcdonald wrote: But who ever told you, or inferred to you, that any LOD demo is about the Complete Linked Data Experience let alone the Complete Data Experience. I didn't capitalize those. A human's experience of data is the product of the underlying data and the tool/experience/interface through which they see it. Via their own inherently subjective context lenses. Excel the pathetic dominates the world of spreadsheets. Nuff said. And yet, you don't seem to have dissolved your company, therefore you don't actually think Excel is the end of all conversations. Don't get your point. We build data access, integration, and management technology. All spreadsheets are interesting to use as consumers and presenters of data. That's it. On your part, you claim Excel is pathetic. My question to you is: what's your alternative? How come it hasn't exploited the massive opportunity at hand? Bottom, your subjective comments about Excel or any other product are unwarranted. Look, can't we just have a civil debate? Disagreements and debates are healthy in any realm. Did write an alternative? Why isn't the world using your alternative if such a thing exists. Bearing in mind the huge market share of Excel why are you overlooking the massive opportunity to cleanup via your superior product? I wasn't making any claims about my project in this thread. But Needle and Google Refine are two examples of attempts to do data-management tools with more of a focus on cleanup and curation. Google Refine != Excel. That isn't why Excel exists. This is one of those context infidelity examples again. My reference to Excel was about separating an application that can consume data from the technology that delivers data to it, and the actual originating sources of said data. Your response was to denigrate Excel, rather that attempt to grasp my point. What is a Data Tool? Again, 100% subjective. I don't think I know what you mean by the word subjective. Clearly not. And maybe therein lines the problem. Subjective implies your world view. Example: you see Google Refine vs Excel as an Apples vs Apples comparison re. Data Reconciliation matters. E.g., Danny Ayers yesterday was trying to make a SPARQL query for Wordnet that found the planets in the solar system that aren't named after Roman gods. But neither he nor I could find any way in the data to distinguish actual planets in the list of solar bodies, so we couldn't quite make it right. And did you post a callout here or on Twitter or anyone else for other folks to chime in? Yes, Danny asked the question on Twitter and on his blog. I saw it and answered it. Nobody else chimed in. Twitter link? I know Rob Vesse has already chimed in with a suggestion, but the call-out link is still interesting :-) -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Wordnet Planets SPARQL Puzzle
On 4/12/11 10:19 AM, glenn mcdonald wrote: BIND(URI(CONCAT(http://dbpedia.org/resource/;, ?label)) AS ?dbpResource) The 1.0/1.1 clunkiness is just temporary, but I feel obliged to point out the hand-waving in this join-via-URI-concatenation... What now? You don't like the manner in which a solution has been constructed? What are you looking for here? -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 10:30 AM, glenn mcdonald wrote: In addition to my prior comments, you could have looked up the source of the subjectively errant assertion So you call Michael Jackson owl:sameAs Michael Rodrick a subjectively errant assertion? I definitely don't know what you mean by subjective. via its source named graph: http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2 http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2 . Or you could have just followed the link: http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fsw.opencyc.org%2F2008%2F06%2F10%2Fconcept%2FMx4rvWuBAJwpEbGdrcN5Y29ycA . I can't see how to tell from either link where the sameAs assertion connecting Jackson to Rodrick came from. Can you show me how to discern the provenance of that particular triple? http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2 http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2 . That's how you discern its from OpenCyc since each datasets is loaded into its now Named Graph. Even easier: follow the link, the copy the value of @href from About: XYZ.. or just click on the About: XYZ hyperlink and you'll find yourself in the OpenCyc data space :-) -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 10:37 AM, glenn mcdonald wrote: On your part, you claim Excel is pathetic. No, I said that it's pathetic that Excel doesn't offer better tools for evaluating and improving data. Excel has always been extensible. You or anyone else can extend it. Thus, how can it be pathetic that Excel doesn't offer this feature when its extremely extensible? The feature in question isn't core functionality in the eyes of Excel product developers. Bottom, your subjective comments about Excel or any other product are unwarranted. Look, can't we just have a civil debate? Disagreements and debates are healthy in any realm. Not sure what to do with this pair of statements. Example: you see Google Refine vs Excel as an Apples vs Apples comparison re. Data Reconciliation matters. I said no such thing. I brought up Google Refine precisely because it's a different sort of thing than Excel. You brought it up in the context of Excel i.e., in response to the thread developing around you utterances that comprised of the patterns Excel and Pathetic. You are basically quibbling about Excel not being capable of the functionality delivered by Google Refine or taking the position that its pathetic that Excel lacks such functionality. You quibble about an inaccurate assertion between DBpedia and OpenCyc re. owl:sameAs. What point are you trying to make re. Pathetic and Excel with regards to Google Refine ? -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Wordnet Planets SPARQL Puzzle
BIND(URI(CONCAT(http://dbpedia.org/resource/;, ?label)) AS ?dbpResource) The 1.0/1.1 clunkiness is just temporary, but I feel obliged to point out the hand-waving in this join-via-URI-concatenation... What now? You don't like the manner in which a solution has been constructed? What are you looking for here? I really think you can figure out for yourself what's not so great about this solution. But to go ahead and state the obvious, this is concatenating wordnet's rdfs:label for these planets directly into a dbpedia URI. This will only work if the identifiers happen to line up exactly. Which they do in the case of these 8 (!) entities, but I wouldn't want to rely on that tactic in general.
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 10:59 AM, glenn mcdonald wrote: If you can't see the data there's nothing to fix, thus we end up in a subjective fools paradise. Not sure who you're talking to here. I'm certainly not arguing against seeing the data. You continue to imply that seeing subjectively imperfect data projected via a data oriented tool is problematic re., your total data experience world view. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Wordnet Planets SPARQL Puzzle
Stop quibbling, contribute a solution. As you know, but others might not, I work on www.needlebase.com, a graph-database project incubated at ITA and due to become part of Google any hour now. It takes a somewhat different approach to data representation and data curation than the RDF/OWL/SPARQL stack. It's free for personal uses and has free trials for commercial uses, so anybody is welcome to find out whether it's suited for their particular problems.
Fwd: Early Bird Registration - Second Annual VIVO Conference
Would it be great to have some folks from the LOD community at this conference to make sure that VIVO interfaces well in with the rest of the LOD cloud? Tim Begin forwarded message: From: VIVO alici...@ufl.edu Date: 2011-04-12 8:35:47 EDT To: timbl+v...@w3.org Subject: Early Bird Registration - Second Annual VIVO Conference Reply-To: alici...@ufl.edu Having trouble viewing this email? Click here Register Quick Links vivoweb.org/conference Gaylord National Preliminary Agenda 2010 VIVO Conference National Harbor Transportation DC Area Map Contact Us We welcome your questions and comments. Email us: VIVO Conference Second Annual VIVO Conference August 24-26, 2011 Gaylord National, Washington D.C. More Information Early Bird Registration Call for Papers Call for Apps Workshops Sponsors Explore Gaylord National About VIVO Early Bird Registration Registration is now open for the Second Annual VIVO Conference. The $350 Early Bird registration rate is available until May 27. You are welcome to register online or by fax/mail. Gaylord National is offering a $179 discounted room rate for VIVO conference attendees. This discount room rate is only available until July 24. Click here to reserve your room: VIVO guests at Gaylord National Official Call for Papers We are pleased to invite you to participate in this year's VIVO conference with contributions to the meeting. We request papers, panels and poster presentations which focus on issues that VIVO is trying to address. Abstracts are due June 1. Topics of interest: Collaboration, Semantic Web, Linked Open Data, Role of VIVO in Science, Adoption of VIVO, Ontologies Implementation of VIVO, Crowd Sourcing, Mapping Networks, Research Discovery, Research Networking, VIVO Development, Using VIVO data Submission: All submissions are handled electronically at EasyChair. For information on submission requirements, refer to the Official Call for Papers. Official Call for Apps The conference is sponsoring a competition for applications using VIVO data to support science. Entries are due July 31. Refer to the Call for National Networking Applications for submission information, including eligibility, evaluation criteria and prizes. Workshops The three-day conference begins with a full day of workshops on August 24. We are pleased to offer six half-day workshops at this year's conference. The workshops are designed for those new to VIVO, those implementing VIVO and those wishing to develop applications using VIVO. Each workshop is designed as a stand alone session that can be mixed and matched depending on your interest in VIVO or your role within a VIVO implementation. Morning Workhops (August 24) Part I: Introduction to Development on the Open Source VIVO Project Introduction to Implementation Visualization in VIVO: A Case Study in How VIVO Data and Technology Can Be Used Afternoon Workshops (August 24) Part II: Advanced Development on the Open Source VIVO Project Creating your Marketing Outreach Plan Data acquisition for VIVO: Extended Ingest by Example Sponsors Silver, Platinum and Gold Sponsor Packages are available for anyone interested in supporting the Second Annual VIVO Conference. In addition to Sponsor Packages, there are several Exclusive Opportunities for marketing and promotion at this year's VIVO conference. For more information, contact Sponsorship Manager, Alan Frankle at Designing Events +1-443-213-1950 or refer to the Sponsor Prospectus. Explore Gaylord National Experience Gaylord National Gaylord National Virtual Tour Relâche Spa, Fitness Center and Pool Kid's Activities Shopping Restaurants, Bars Lounge Pose Ultra Lounge (home of this year's poster session!) About VIVO VIVO is an open source, open ontology, open process platform for hosting information about scientists and their interests, activities and accomplishments. VIVO supports open development and facilitates integration of science through simple, standard semantic web technologies. VIVO: Enabling National Network of Scientists is supported by NIH Award U24 RR029822. Learn more at vivoweb.org Forward email This email was sent to timbl+v...@w3.org by alici...@ufl.edu | Update Profile/Email Address | Instant removal with SafeUnsubscribe™ | Privacy Policy. VIVO | 1600 SW Archer Road | PO Box 100219 | Gainesville | FL | 32610
Re: Wordnet Planets SPARQL Puzzle
On 4/12/11 11:16 AM, glenn mcdonald wrote: Stop quibbling, contribute a solution. As you know, but others might not, I work on www.needlebase.com http://www.needlebase.com, a graph-database project incubated at ITA and due to become part of Google any hour now. It takes a somewhat different approach to data representation and data curation than the RDF/OWL/SPARQL stack. It's free for personal uses and has free trials for commercial uses, so anybody is welcome to find out whether it's suited for their particular problems. Post a link showing how it solves the problems you've gripped about without the data living in a silo. By this I mean, the data presentation pages and data sources should be loosely coupled. In addition, your Data Object Identifiers should resolve to Referent Representation (description graphs) via URLs. You do that and I'll retract my silo tag :-) If you have a dataset fix for Danny's problems (or any others you've stumbled across along the way) do share via a URL. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Wordnet Planets SPARQL Puzzle
If you have a dataset fix for Danny's problems (or any others you've stumbled across along the way) do share via a URL. Well, the problems in Danny's case were these: - the required query path to connect gods to planets was non-obvious and not trivial to figure out by exploring - doing negation in SPARQL 1.0 is clumsy - the wordnet dataset lacked identification of actual planets I solved the first problem by just poking around patiently. This kind of thing is easier and faster in Needle because the Needle explorer UI is configurable by the user, and can be extended by calculated fields. It might be interesting to load the Wordnet data into Needle. I haven't done that yet, and it's bigger than the limits on our free personal accounts, but if anybody wants to try it, let me know and I'll see if we can set up an account with higher limits for you. Negation is definitely better in SPARQL 1.1 than 1.0, so the obvious solution there is upgrading the server behind the wordnet dataset. The query would be simpler in Thread, but that's a different topic.** As for the actual-planet thing, what you really want there is some shared identifiers. Rob's query used one dataset's strings as parts of another dataset's identifiers, which is a hopeful approach. I see that dbpedia has links to opencyc IDs, and wordnet has links to an alternate wordnet URI set hosted at w3c.org, so maybe there's a link we could find by following those two chains further. Absent that, Needle's answer is to support human curation of the data, so we'd pull in both sets, cluster them for you, and let you confirm or reject the matches. I don't know what the administrative tools for the wordnet dataset look like, but I think their RDF version is an export, not the native form of the data, so there's no real comparison to be made there. **For the interested, the single-domain SPARQL query was this: PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/ PREFIX id:http://wordnet.rkbexplorer.com/id/ SELECT DISTINCT ?planet WHERE { ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 . ?s1 rdfs:label ?planet . OPTIONAL { ?s1 wn:containsWordSense ?ws1 . ?ws1 wn:word ?w . ?ws2 wn:word ?w . ?s2 wn:containsWordSense ?ws2 . ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 . } FILTER (!bound(?s2)) } and in SPARQL 1.1 it could be simplified to (I think): PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/ PREFIX id:http://wordnet.rkbexplorer.com/id/ SELECT DISTINCT ?planet WHERE { ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 . ?s1 rdfs:label ?planet . MINUS { ?s1 wn:containsWordSense ?ws1 . ?ws1 wn:word ?w . ?ws2 wn:word ?w . ?s2 wn:containsWordSense ?ws2 . ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 . } } where in Needle this same basic query idea might be done like this: Synset:Solar System:!(.Hyponym.Sense.Word.Sense.Synset.Meronym:Roman Deity)
Re: 15 Ways to Think About Data Quality (Just for a Start)
You continue to imply that seeing subjectively imperfect data projected via a data oriented tool is problematic re., your total data experience world view. I continue to think it's hilarious that you consider it subjectively imperfect that your dataset says Michael Jackson and Michael Rodrick are the same person. What would constitute objectively imperfect to you? So yes, I think you should feel a little embarrassed about broadcasting links to a demo in which the very first piece of data one sees is obviously wrong. You've got billions of entities in dbpedia, and the technology doesn't care which one you pick, so surely you could pick one where the errors aren't as prominent. The fact that you didn't, and don't seem to care, sends a message about your attitude towards data.
Re: Wordnet Planets SPARQL Puzzle
On 4/12/11 1:39 PM, glenn mcdonald wrote: If you have a dataset fix for Danny's problems (or any others you've stumbled across along the way) do share via a URL. Well, the problems in Danny's case were these: - the required query path to connect gods to planets was non-obvious and not trivial to figure out by exploring - doing negation in SPARQL 1.0 is clumsy - the wordnet dataset lacked identification of actual planets I solved the first problem by just poking around patiently. This kind of thing is easier and faster in Needle because the Needle explorer UI is configurable by the user, and can be extended by calculated fields. It might be interesting to load the Wordnet data into Needle. I haven't done that yet, and it's bigger than the limits on our free personal accounts, but if anybody wants to try it, let me know and I'll see if we can set up an account with higher limits for you. Negation is definitely better in SPARQL 1.1 than 1.0, so the obvious solution there is upgrading the server behind the wordnet dataset. The query would be simpler in Thread, but that's a different topic.** As for the actual-planet thing, what you really want there is some shared identifiers. Rob's query used one dataset's strings as parts of another dataset's identifiers, which is a hopeful approach. I see that dbpedia has links to opencyc IDs, and wordnet has links to an alternate wordnet URI set hosted at w3c.org http://w3c.org, so maybe there's a link we could find by following those two chains further. Absent that, Needle's answer is to support human curation of the data, so we'd pull in both sets, cluster them for you, and let you confirm or reject the matches. I don't know what the administrative tools for the wordnet dataset look like, but I think their RDF version is an export, not the native form of the data, so there's no real comparison to be made there. **For the interested, the single-domain SPARQL query was this: PREFIX rdfs:http://www.w3.org/2000/01/rdf-schema# PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/ PREFIX id:http://wordnet.rkbexplorer.com/id/ SELECT DISTINCT ?planet WHERE { ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 . ?s1 rdfs:label ?planet . OPTIONAL { ?s1 wn:containsWordSense ?ws1 . ?ws1 wn:word ?w . ?ws2 wn:word ?w . ?s2 wn:containsWordSense ?ws2 . ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 . } FILTER (!bound(?s2)) } and in SPARQL 1.1 it could be simplified to (I think): PREFIX rdfs:http://www.w3.org/2000/01/rdf-schema# PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/ PREFIX id:http://wordnet.rkbexplorer.com/id/ SELECT DISTINCT ?planet WHERE { ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 . ?s1 rdfs:label ?planet . MINUS { ?s1 wn:containsWordSense ?ws1 . ?ws1 wn:word ?w . ?ws2 wn:word ?w . ?s2 wn:containsWordSense ?ws2 . ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 . } } where in Needle this same basic query idea might be done like this: Synset:Solar System:!(.Hyponym.Sense.Word.Sense.Synset.Meronym:Roman Deity) Glenn, Great! We've achieved something here. You've shared your solution to a problem :-) Important note to others: Glenn and I aren't strangers, we've had these debates (sometimes heated) repeatedly in the past. The bridge I seek to cross with Glenn simply boils down to encouraging more of what he's done here (actual thread and this particular post) i.e., spot a problem and provide a solution that's ultimately a contribution to the general pool. That (IMHO) is exponentially better than shooting down the efforts of others at first blush - intentionally or inadvertently. Glenn: I am 100% in agreement with human curration I just refer to it as conversation about the data that becomes part of the data. Basically, doing today's Wikipedia dance as part of the provenance aspect of a given data space. In a different thread it why I said: we ultimately want to be able to better discern the why dimension of a who, what, when, and where better than we can today, we'll never figure out why 100% but 0% is valuable in of itself etc.. The subjectivity inherent in data quality is why we ultimately have to discuss our way to the construction of context lenses. All of this can happen in Linked Data form. No need for any Data Silos. Named Graphs, Named Rules, and the ability to calibrate context via combination of reasoning and inference rules are integral components of the Linked Data mission, at least that what I see via my subjective context lenses :-) Links: 1. http://lod.openlinksw.com/c/CV5SCWN -- your SPARQL query 2. http://lod.openlinksw.com/c/CYOT3KC -- SPARQL 1.1 variant 3. http://lod.openlinksw.com/c/CYGCJVN - DESCRIBE (using this via raw /sparql endpoint will produce a graph in format of your choice). We also have a linkset in the making that would simplify this quest next time around. That's what I call
Call For Posters and Demos: Making Sense of Microposts (#MSM2011)
*** CALL FOR POSTERS, DEMOS AND LATE-BREAKING WORK *** 1st Workshop on Making Sense of Microposts #MSM2011 7th Extended (former European) Semantic Web Conference ESWC 2010 30 May 2011 Heraklion, Greece *** http://research.hypios.com/msm2011/ *** The topic of Making Sense of Microposts generated a lot of interest in the Semantic Web research community, confirmed by 19 high quality paper submissions, out of which 9 were accepted. We are now opening a second call for a poster and demo track for presenting ideas, late-breaking results, ongoing research projects, and speculative or innovative work in progress. Posters and Demos are intended to provide authors and participants with the ability to connect with one another and to engage in discussions about the work. The call is intended for presentations of both work in research and industry. Authors are invited to submit a 2-page paper (PDF, Springer LNCS style [1]) with a separate abstract (up to 150 words). The paper must clearly demonstrate relevance to the #MSM2011 topics. Decisions about acceptance will be based on relevance to the Semantic Web area, originality, potential significance, topicality and clarity. The accepted posters and demos will be presented in the coffee breaks during the Workshop, thus giving the poster authors the opportunity to interact with other participants and obtain feedback on their work. The poster abstracts will be available on the Workshop website, but will not be included in the official proceedings. An award for the most innovative poster/demo will be given by the workshop chairs. For more information about the Springer's Lecture Notes in Computer Science (LNCS) please visit: [1] http://www.springer.com/computer/lncs?SGWID=0-164-7-72376-0 *Topics of Interest* We encourage submissions from, but not limited to, the following topics of interest: Microposts and Semantic Web technologies - Knowledge Discovery and Information Extraction - Factual Inference - Ontology/vocabulary modelling and learning from Microposts - Integrating Microposts into the Web of Linked Data Social/Web Science studies - Analysis of Micropost data patterns - Motivations for creating and consuming Microposts - Relevance of Microposts and factors that influence them - Community/network analysis of Micropost dynamics - Ethics/privacy implications of publishing and consuming Microposts Context - Utilising context (time, location, feeling) - Contextual inference mechanisms - Social awareness streams and Online Presence - Event Detection Applying Microposts - User profiling/recommendation/personalization approaches using Microposts - Public opinion mining - Trend prediction - Expertise finding - Business analysis/market scanning - Emergency systems - Urban sensing and location-based applications *Important Dates* Submission (Posters/Demos): May 1, 2011 (23:59 Hawaii time) Notification (Posters/Demos): May 15, 2011 Camera Ready (Posters/Demos): May 20, 2011 *Submission* Submission and reviewing of poster papers will be electronic, via the MSM2011 EasyChair installation at: https://www.easychair.org/conferences/?conf=msm2011 *General contact* If you have any questions, please contact: msm.org...@gmail.com *#MSM Chairs* Matthew Rowe Milan Stankovic Aba-Sah Dadzie Mariann Hardey
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 1:52 PM, glenn mcdonald wrote: You continue to imply that seeing subjectively imperfect data projected via a data oriented tool is problematic re., your total data experience world view. I continue to think it's hilarious that you consider it subjectively imperfect that your dataset says Michael Jackson and Michael Rodrick are the same person. What would constitute objectively imperfect to you? The problem is this: I isn't my dataset. It's data loaded into an instance of Virtuoso. So yes, I think you should feel a little embarrassed about broadcasting links to a demo in which the very first piece of data one sees is obviously wrong. To you the first piece of that is an owl:sameAs assertion. That's 100% fine for you, but that isn't true for everyone else. It just isn't. You've got billions of entities in dbpedia, and the technology doesn't care which one you pick, so surely you could pick one where the errors aren't as prominent. No, DBpedia doesn't have a billions of entities, that just one dataset. The Virtuoso instance in question is a LOD cloud cache instance i.e., we've loaded the available datasets into the instance. From that I produce a variety of demos. Just as anyone else can since the endpoints are all public. The fact that you didn't, and don't seem to care, sends a message about your attitude towards data. Again, context infidelity. In due course you will understand my point. For now, we can go back an forth. You characterization is 100% inaccurate. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: 15 Ways to Think About Data Quality (Just for a Start)
So yes, I think you should feel a little embarrassed about broadcasting links to a demo in which the very first piece of data one sees is obviously wrong. To you the first piece of that is an owl:sameAs assertion. That's 100% fine for you, but that isn't true for everyone else. It just isn't. Why, is the page dynamically reconfigured for other people? I'm not saying first in some mushy philosophical sense, I'm talking about the first attribute that appears in the structured-data section of the page, right under the headings Attributes and Values. You've got billions of entities in dbpedia, and the technology doesn't care which one you pick, so surely you could pick one where the errors aren't as prominent. No, DBpedia doesn't have a billions of entities, that just one dataset. What? Whatever: you've got plenty of other entities, so surely you could pick one where the errors aren't as prominent. Here, for example, is the next one I tried: http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FTori_Amos There are some dubious bits to this, too (she only composed one song?** a person is subsequent work of a song?***), but at least this is a page about a person that appears to be about a single person. Same technology, better demo. In due course you will understand my point. Understood your points the first hundred times you stated them. Any time you'd like to take a turn understanding mine, feel free. You characterization is 100% inaccurate. In the context of your insistence on the subjectivity of everything, I assume this is intended as a joke. Funnier without the typo. **Completeness failure ***Modeling Correctness error
Discussion meta-comment
A recent thread included discussion of how to reply to postings. For what it's worth, I don't agree that the best way to reply to a posting about doing something in one system is to say: Well this is how I do it in my system. At its best, it is hard to understand what the respondent means, because it entails (at least for the original poster who is looking for feedback on their system) working out what the respondent's system view is implicitly, using terms that the respondent finds comfortable, but are often alien to the poster. At its worst, the original message is completely lost, as the thread simply moves to a discussion of the respondent's system. It is far better if respondents try to communicate with the poster by addressing the post directly, using the poster's terms wherever they can. And it should certainly be acceptable to give the poster feedback, including comments that may seem negative as well as positive, without having another implementation or solution in your pocket. I, as well as others I know, find the culture that has developed on this list of responses saying Well this is how I do it alienating, and thus sometimes a barrier to posting and genuine responses, and so actually stifles discussion. Happy to be told I am wrong, or in a tiny minority, without hearing any proposals for better solutions. :-) Hugh
Re: 15 Ways to Think About Data Quality (Just for a Start)
Nothing about the DBMS hosting the datasets (where each has a Named Graph IRI) prevents the beholder or consumer from achieving the following via the available data access endpoints: 1. Accessing and altering the source query or SPARQL protocol URL I tried clicking your OpenLink Data Explorer link to do this, and got a page with broken graphics and a frozen loading.. indicator. Tried again and got to a Data Explorer page that says 0 records (0 triples, 0 properties) match selected filters. Nothing to display. Perhaps your filters are too restrictive? So I'd say something is preventing the beholder from achieving this. 2. Adding or removing pragmas re. inference context (owl:sameAs expansion, invocation of fuzzy InverseFunctionalProperty rules, or combination of both) as part of the view alteration quest outlined above I went to the Settings page to check this out, and found the owl:sameAs toggle. Of course, it's unchecked, despite all those sameAs relationships showing up, and when I check it they go away, so you've wired the setting backwards. Nice job. 3. Viewing original or actual query results via alternative tools that can process HTTP response payloads -- remember nothing about SPARQL mandates RDF as sole query results format across SELECT, DESCRIBE, or CONSTRUCT queries 4. Sharing new query, new result set, new data presentation etc.. via a URL as part of an evolving conversation about the data in question. These are great. I support HTTP access, multiple formats, and URL-addressable queries/results/views. Remember, I do espouse to the mantra: Data is like Wine while Application code is like Fish. A Good (Cool) URL or URI should be able to stand the test of time :-) Catchy.
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 3:02 PM, glenn mcdonald wrote: So yes, I think you should feel a little embarrassed about broadcasting links to a demo in which the very first piece of data one sees is obviously wrong. To you the first piece of that is an owl:sameAs assertion. That's 100% fine for you, but that isn't true for everyone else. It just isn't. Why, is the page dynamically reconfigured for other people? As per my latest post. It's just a point of view. You are now talking about UI aesthetics rather than data quality. The presentation layer is just that, a presentation layer. The Data layer is just that, a Data Layer. I'm not saying first in some mushy philosophical sense, I'm talking about the first attribute that appears in the structured-data section of the page, right under the headings Attributes and Values. Because out of 21 Billion+ records why should the page order by perceived quality of assertion in an owl:sameAs relation? Why? Because it might bug you? Is there an inherent semantic in Links that infers: 1. Thou must click 2. Thou must click and infer 3. Thous must infer? Moreover, the issue with OpenCyc links to and from DBpedia (not performed by me or anyone at OpenLink Software) is something that is going to be resolved when OpenCyc release a new linkset. There's absolutely nothing wrong with a page that immediately brings to attention misuse or dangerous use of owl:sameAs. You (as a cognitively endowed being) see the page on one context, that fine. But others will also look at the same page and see things differently. This is the very basis of cognition. We are wired to see things differently. IMHO a clever feature inherited from our universe. Imagine if we could only observe the same limited dimensions of an observation subject? The presentation is the page != a position about how I feel about data quality. It's is just a presentation of data that's loosely coupled to its data sources. You can even take the source code of the page and tweak it for your specific needs if you like. That's what this is supposed to be about. I could start to understand your view point if my presentation, data sources etc.. where imposed on you etc. That simply isn't the case, and that's 100% antithetical to the concept of Linked Data that I am particularly excited about i.e., the loose coupling knowledge, information, and data that inherently facilitates free remixing and sharing of: data sources, queries, and presentation pages. You've got billions of entities in dbpedia, and the technology doesn't care which one you pick, so surely you could pick one where the errors aren't as prominent. No, DBpedia doesn't have a billions of entities, that just one dataset. What? Whatever: you've got plenty of other entities, so surely you could pick one where the errors aren't as prominent. Here, for example, is the next one I tried: http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FTori_Amos Again, I pick examples like 'Micheal Jackson' because like 'New York', 'Paris' etc., my focal point is/was: use of entity type and other attributes as mechanism for disambiguating my quests for information about a specific entity, at massive scales. The aforementioned entity examples ultimately accentuate the challenge at hand. I won't drop triples in the OpenCyc Named Graph simply because of a few questionable relations potentially upsetting a few observers. I am more interested in real demos, and that means bad or questionable data warts are part of the package. Exercises like this have triggered many a dataset fix in LOD land. You'd be quite surprised (bearing in mind your perception of my data quality values) chow many dataset producers I've worked with re. data fixes across the ABox and TBox realms. There are some dubious bits to this, too (she only composed one song?** a person is subsequent work of a song?***), but at least this is a page about a person that appears to be about a single person. Same technology, better demo. No, your demo of the same technology. That's a better characterization. Again, the inherent tone of your commentary continues to echo a contentious problem: you can always speak for yourself, just done speak for me. We are individuals (in a ! owl:sameAs relation). In due course you will understand my point. Understood your points the first hundred times you stated them. Any time you'd like to take a turn understanding mine, feel free. Open the door first i.e., stop telling me about myself. We can have a conversation, we've had many in the past. All you have to do is open the door. You characterization is 100% inaccurate. In the context of your insistence on the subjectivity of everything, I assume this is intended as a joke. Funnier without the typo. **Completeness failure ***Modeling Correctness error Yes, LOL re. typo too. Here's a
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 3:25 PM, glenn mcdonald wrote: Nothing about the DBMS hosting the datasets (where each has a Named Graph IRI) prevents the beholder or consumer from achieving the following via the available data access endpoints: 1. Accessing and altering the source query or SPARQL protocol URL I tried clicking your OpenLink Data Explorer link to do this, and got a page with broken graphics and a frozen loading.. indicator. Tried again and got to a Data Explorer page that says 0 records (0 triples, 0 properties) match selected filters. Nothing to display. Perhaps your filters are too restrictive? So I'd say something is preventing the beholder from achieving this. Please post the URL in question so I can double check what's happening. Remember, I am sharing URLs across the Web, there are many factor in play re. time variant nature of resources. etc.. Anyway, give me a URL and I can look into what might be happening. 2. Adding or removing pragmas re. inference context (owl:sameAs expansion, invocation of fuzzy InverseFunctionalProperty rules, or combination of both) as part of the view alteration quest outlined above I went to the Settings page to check this out, and found the owl:sameAs toggle. Of course, it's unchecked, despite all those sameAs relationships showing up, and when I check it they go away, so you've wired the setting backwards. Nice job. To you, I've wired the setting backwards i.e., I opted not to impose the overhead of owl:sameAs union expansion by default. Overhead in this case also includes what's ultimately your prime gripe: an unrepresentative graph since the union is comprised of attribute=value pairs from individuals that aren't the same. Methinks, the defaults are fine. Worst that happens (without addition overhead) is you click a value exposed via a broken owl:sameAs relation. The system doesn't reason unless you ask it to do so explicitly. Your world view != mine. Thus, don't try to impose *your* information expectations on *my* information projections. You can always make a different view. That's why loosely coupling information and data is vital. 3. Viewing original or actual query results via alternative tools that can process HTTP response payloads -- remember nothing about SPARQL mandates RDF as sole query results format across SELECT, DESCRIBE, or CONSTRUCT queries 4. Sharing new query, new result set, new data presentation etc.. via a URL as part of an evolving conversation about the data in question. These are great. I support HTTP access, multiple formats, and URL-addressable queries/results/views. But you have a silo. The day you deliver Objects with IDs that resolve to their Representations via URLs is the day I'll drop the silo tag re. your data space :-) Remember, I do espouse to the mantra: Data is like Wine while Application code is like Fish. A Good (Cool) URL or URI should be able to stand the test of time :-) Catchy. Yes, catchy cos it will catch on, courtesy of the burgeoning Web of Linked Data. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: 15 Ways to Think About Data Quality (Just for a Start)
Please post the URL in question so I can double check what's happening. Remember, I am sharing URLs across the Web, there are many factor in play re. time variant nature of resources. etc.. Anyway, give me a URL and I can look into what might be happening. http://linkeddata.uriburner.com/ode/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson To you, I've wired the setting backwards i.e., I opted not to impose the overhead of owl:sameAs union expansion by default. No, this is not a to you thing. The checkbox is off, but the sameAs expansions *are* showing. I'm not arguing a philosophical point, I'm observing that you have a UI bug. These are great. I support HTTP access, multiple formats, and URL-addressable queries/results/views. But you have a silo. The day you deliver Objects with IDs that resolve to their Representations via URLs is the day I'll drop the silo tag re. your data space :-) I wasn't even talking about Needle, but that day came long ago. All Needle nodes have IDs that resolve to representations via URLs.
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 3:55 PM, glenn mcdonald wrote: Please post the URL in question so I can double check what's happening. Remember, I am sharing URLs across the Web, there are many factor in play re. time variant nature of resources. etc.. Anyway, give me a URL and I can look into what might be happening. http://linkeddata.uriburner.com/ode/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson To you, I've wired the setting backwards i.e., I opted not to impose the overhead of owl:sameAs union expansion by default. No, this is not a to you thing. The checkbox is off, but the sameAs expansions *are* showing. I'm not arguing a philosophical point, I'm observing that you have a UI bug. The link above doesn't correspond to any link I've sent to you owl:sameAs inference context. Basically, that's ODE one of many browsers we offer. Its forte isn't showcasing owl:sameAs expansion. Here are the links I sent earlier: 1. http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson -- basic description of 'Micheal Jackson' from DBpedia 2. http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson -- list of source named graphs in the host DBMS 3. http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=2 -- list of named graphs with triples that reference this subject 4. http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=3 -- explicit owl:sameAs relations across the entire DBMS (clicking on each Identifier will unveil the description graph for the Referent of said Identifier) 5. http://lod.openlinksw.com/fct/rdfdesc/usage.vsp?g=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksontp=4 -- use of an InverseFunctionalProperty based rule to generate a fuzzy list of Identifiers that potentially share the same Referent (click on each link as per prior step) 6. http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksonsas=yes -- inference context enhanced description of 'Micheal Jackson' (this is a union expansion of all properties across all Identifiers in an owl:sameAs relation with DBpedia Entity, hence use of paging re. handling result set size.) 7. http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jacksonsas=yesp=6lp=7op=4prev=gp=6 - Page 5 of 8 re. enhanced description of 'Micheal Jackson' . I also sent the following links in response to your SPARQL solution to Danny's puzzle: 1. http://lod.openlinksw.com/c/CV5SCWN -- your SPARQL query 2. http://lod.openlinksw.com/c/CYOT3KC -- SPARQL 1.1 variant 3. http://lod.openlinksw.com/c/CYGCJVN - DESCRIBE (using this via raw /sparql endpoint will produce a graph in format of your choice). Your queries: **For the interested, the single-domain SPARQL query was this: PREFIX rdfs:http://www.w3.org/2000/01/rdf-schema# PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/ PREFIX id:http://wordnet.rkbexplorer.com/id/ SELECT DISTINCT ?planet WHERE { ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 . ?s1 rdfs:label ?planet . OPTIONAL { ?s1 wn:containsWordSense ?ws1 . ?ws1 wn:word ?w . ?ws2 wn:word ?w . ?s2 wn:containsWordSense ?ws2 . ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 . } FILTER (!bound(?s2)) } and in SPARQL 1.1 it could be simplified to (I think): PREFIX rdfs:http://www.w3.org/2000/01/rdf-schema# PREFIX wn:http://www.w3.org/2006/03/wn/wn20/schema/ PREFIX id:http://wordnet.rkbexplorer.com/id/ SELECT DISTINCT ?planet WHERE { ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 . ?s1 rdfs:label ?planet . MINUS { ?s1 wn:containsWordSense ?ws1 . ?ws1 wn:word ?w . ?ws2 wn:word ?w . ?s2 wn:containsWordSense ?ws2 . ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 . } } These are great. I support HTTP access, multiple formats, and URL-addressable queries/results/views. But you have a silo. The day you deliver Objects with IDs that resolve to their Representations via URLs is the day I'll drop the silo tag re. your data space :-) I wasn't even talking about Needle, but that day came long ago. All Needle nodes have IDs that resolve to representations via URLs. Okay, what where you talking about? Specificity helps everyone, this is a public forum etc.. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Discussion meta-comment
+1 to your observation. And FWIW, I hesitated for 30 minutes literally before sending this message, deciding to say very little lest I get pulled into some philosophical debate myself :) Sent from my iPhone On Apr 12, 2011, at 12:10 PM, Hugh Glaser h...@ecs.soton.ac.uk wrote: A recent thread included discussion of how to reply to postings. For what it's worth, I don't agree that the best way to reply to a posting about doing something in one system is to say: Well this is how I do it in my system. At its best, it is hard to understand what the respondent means, because it entails (at least for the original poster who is looking for feedback on their system) working out what the respondent's system view is implicitly, using terms that the respondent finds comfortable, but are often alien to the poster. At its worst, the original message is completely lost, as the thread simply moves to a discussion of the respondent's system. It is far better if respondents try to communicate with the poster by addressing the post directly, using the poster's terms wherever they can. And it should certainly be acceptable to give the poster feedback, including comments that may seem negative as well as positive, without having another implementation or solution in your pocket. I, as well as others I know, find the culture that has developed on this list of responses saying Well this is how I do it alienating, and thus sometimes a barrier to posting and genuine responses, and so actually stifles discussion. Happy to be told I am wrong, or in a tiny minority, without hearing any proposals for better solutions. :-) Hugh
Re: 15 Ways to Think About Data Quality (Just for a Start)
http://linkeddata.uriburner.com/ode/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson The link above doesn't correspond to any link I've sent to you owl:sameAs inference context. Basically, that's ODE one of many browsers we offer. Its forte isn't showcasing owl:sameAs expansion. Here are the links I sent earlier: 1. http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson-- basic description of 'Micheal Jackson' from DBpedia As I said already, go to this link and then click OpenLink Data Explorer at the bottom, hoping, as the message promises, to Explore alternative Linked Data Views Meshups. Is there another link somewhere to get to the SPARQL query behind the page? I don't see one. I wasn't even talking about Needle, but that day came long ago. All Needle nodes have IDs that resolve to representations via URLs. Okay, what where you talking about? Specificity helps everyone, this is a public forum etc.. In Needle's Pazz Jop music-poll dataset, the (relative) ID for Michael Jackson is 76337. Here's a URI for the Michael Jackson node in Needle's Pazz Jop dataset: https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337 Here's a URL for seeing that node in Needle's UI: https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337typeId=9149585060559937608render=List Here's a URL for getting that data in JSON: https://pub.needlebase.com/actions/api/V2Visualizer.do?domain=Pazz-Joprender=Jsvthread=%4076337typeId=9149585060559937608render=ListshowAllDetails=falsedefaultCol=falseshowRejectedGroups=falseselectedColumns=ColuwCRkEpwkbzaLG6%2CColaNkcDjUUlijFYSv%2CColZ2NJlXSlrQKvP7W%2CColQcBNiWkLbAqCXcY%2CColeZBKuJQ41hg96q0%2CColMqmX4mWy1jw4dkP%2CCol3kQ0iFiMTlmghrr%2CCol6UVm8wlhDfgfNlYstartPage=1showCheckboxes=falsererun=false I make no claims about the prettiness of these.
Re: Discussion meta-comment
On 4/12/11 4:33 PM, David. Huynh wrote: I, as well as others I know, find the culture that has developed on this list of responses saying Well this is how I do it alienating, and thus sometimes a barrier to posting and genuine responses, and so actually stifles discussion. David/Hugh, I get the point, but don't know the comment target, so I'll respond with regards to myself as one participant in today's extensive debate with Glenn. I hope I haven't said or inferred this is how we/I do it without providing at the very least a link to what I am talking about etc? More than anything else, I believe tractable discussion and debate is good. That's the most predictable route (I know) to solving real problems etc.. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: 15 Ways to Think About Data Quality (Just for a Start)
On 4/12/11 5:30 PM, glenn mcdonald wrote: http://linkeddata.uriburner.com/ode/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson The link above doesn't correspond to any link I've sent to you owl:sameAs inference context. Basically, that's ODE one of many browsers we offer. Its forte isn't showcasing owl:sameAs expansion. Here are the links I sent earlier: 1. http://lod.openlinksw.com/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FMichael_Jackson-- basic description of 'Micheal Jackson' from DBpedia As I said already, go to this link and then click OpenLink Data Explorer at the bottom, hoping, as the message promises, to Explore alternative Linked Data Views Meshups. Is there another link somewhere to get to the SPARQL query behind the page? I don't see one. Link: 1. http://linkeddata.informatik.hu-berlin.de/uridbg/index.php?url=http%3A%2F%2Flod.openlinksw.com%2Fdescribe%2F%3Furi%3Dhttp%253A%252F%252Fdbpedia.org%252Fresource%252FMichael_Jackson+useragentheader=acceptheader= -- URI Debugger output showing you what in link/ and/or what you can extract via Link: response headers As for the sole ODE option, that's kinda confusing as it isn't a coherent segue in the grand scheme of things for the human user per se. What I mean by that is this: we should also include the following links in the footer: 1. iSPARQL link that places you in the QBE or Advanced Mode tabs of our SPARQL Query Builder -- in this case you would see the DESCRIBE Query rather than having to processes the encoded URLs in head/ link/ or Link: response headers 2. PivotViewer links that places you either in the PivotViewer description page of the SPARQL query editor 3. Raw /sparql endpoint page that like #1 alleviates tedium of decoding the encoded SPARQL protocol URLs. Then via 1-3 you end up with more human friendly routes to SPARQL behind the page. I wasn't even talking about Needle, but that day came long ago. All Needle nodes have IDs that resolve to representations via URLs. Okay, what where you talking about? Specificity helps everyone, this is a public forum etc.. In Needle's Pazz Jop music-poll dataset, the (relative) ID for Michael Jackson is 76337. Here's a URI for the Michael Jackson node in Needle's Pazz Jop dataset: https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337 https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337 Here's a URL for seeing that node in Needle's UI: https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337typeId=9149585060559937608render=List https://pub.needlebase.com/actions/visualizer/V2Visualizer.do?domain=Pazz-Jopthread=%4076337typeId=9149585060559937608render=List Here's a URL for getting that data in JSON: https://pub.needlebase.com/actions/api/V2Visualizer.do?domain=Pazz-Joprender=Jsvthread=%4076337typeId=9149585060559937608render=ListshowAllDetails=falsedefaultCol=falseshowRejectedGroups=falseselectedColumns=ColuwCRkEpwkbzaLG6%2CColaNkcDjUUlijFYSv%2CColZ2NJlXSlrQKvP7W%2CColQcBNiWkLbAqCXcY%2CColeZBKuJQ41hg96q0%2CColMqmX4mWy1jw4dkP%2CCol3kQ0iFiMTlmghrr%2CCol6UVm8wlhDfgfNlYstartPage=1showCheckboxes=falsererun=false https://pub.needlebase.com/actions/api/V2Visualizer.do?domain=Pazz-Joprender=Jsvthread=%4076337typeId=9149585060559937608render=ListshowAllDetails=falsedefaultCol=falseshowRejectedGroups=falseselectedColumns=ColuwCRkEpwkbzaLG6%2CColaNkcDjUUlijFYSv%2CColZ2NJlXSlrQKvP7W%2CColQcBNiWkLbAqCXcY%2CColeZBKuJQ41hg96q0%2CColMqmX4mWy1jw4dkP%2CCol3kQ0iFiMTlmghrr%2CCol6UVm8wlhDfgfNlYstartPage=1showCheckboxes=falsererun=false I make no claims about the prettiness of these. Not worried about the prettiness etc.. Are 'Michael Jackson' Object ID and Object Representation Access Address distinct? I already know that an HTTP GET against the Representation Address will return an EAV graph, once you loosen authentication requirements re. JSON representation :-) BTW - can I assume this is the actual URL that you intended above re. access to JSON based graph representation: https://pub.needlebase.com/actions/api/V2Visualizer.do?domain=Pazz-Joprender=Jsvthread=%4076337typeId=9149585060559937608render=ListshowAllDetails=falsedefaultCol=falseshowRejectedGroups=falseselectedColumns=ColuwCRkEpwkbzaLG6%2CColaNkcDjUUlijFYSv%2CColZ2NJlXSlrQKvP7W%2CColQcBNiWkLbAqCXcY%2CColeZBKuJQ41hg96q0%2CColMqmX4mWy1jw4dkP%2CCol3kQ0iFiMTlmghrr%2CCol6UVm8wlhDfgfNlYstartPage=1showCheckboxes=falsererun=false ? -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: 15 Ways to Think About Data Quality (Just for a Start)
Are 'Michael Jackson' Object ID and Object Representation Access Address distinct? Yes. BTW - can I assume this is the actual URL that you intended above re. access to JSON based graph representation: https://pub.needlebase.com/actions/api/V2Visualizer.do?domain=Pazz-Joprender=Jsvthread=%4076337typeId=9149585060559937608render=ListshowAllDetails=falsedefaultCol=falseshowRejectedGroups=falseselectedColumns=ColuwCRkEpwkbzaLG6%2CColaNkcDjUUlijFYSv%2CColZ2NJlXSlrQKvP7W%2CColQcBNiWkLbAqCXcY%2CColeZBKuJQ41hg96q0%2CColMqmX4mWy1jw4dkP%2CCol3kQ0iFiMTlmghrr%2CCol6UVm8wlhDfgfNlYstartPage=1showCheckboxes=falsererun=false You can assume it if you'd like. Or you can just try it and see...