AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval
Hi Kingsley, Pat and all, Chris/Anja: I believe this data set was touched on your end, right? Yes, Anja will fix the file and will send an updated version. Pat Hayes wrote: This website should be taken down immediately, before it does serious harm. It is irresponsible to publish such off-the-wall equivalentClass assertions. Pat: Your comment seems to imply that you see the Semantic Web as something consistent that can be broken by individual information providers publishing false information. If this is the case, the Semantic Web will never fly! Everything on the Web is a claim by somebody. There are no facts, there is no truth, there are only opinions. Semantic Web applications must take this into account and therefore always assess data quality and trustworthiness before they do something with the data. If you build applications that brake once somebody publishes false information, you are obviously doomed. As I thought this would be generally understood, I'm very surprised by your comment. Cheers, Chris -Ursprüngliche Nachricht- Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im Auftrag von Kingsley Idehen Gesendet: Montag, 10. August 2009 23:29 An: Kavitha Srinivas Cc: Tim Finin; Anja Jentzsch; public-lod@w3.org; dbpedia- discuss...@lists.sourceforge.net; Chris Bizer Betreff: Re: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval Kavitha Srinivas wrote: I will fix the URIs.. I believe the equivalenceClass assertions were added in by someone at OpenLink (I just sent the raw file with the conditional probabilities for each pair of types that were above the .80 threshold). So can whoever uploaded the file fix the property to what Tim suggested? Hmm, I didn't touch the file, neither did anyone else at OpenLink. I just downloaded what was uploaded at: http://wiki.dbpedia.org/Downloads33, any based on my own personal best practices, put the data in a separate Named Graph :-) Chris/Anja: I believe this data set was touched on your end, right? Please make the fixes in line with the findings from the conversation on this thread. Once corrected, I or someone else will reload. Kingsley Thanks! Kavitha On Aug 10, 2009, at 5:03 PM, Kingsley Idehen wrote: Kavitha Srinivas wrote: Agree completely -- which is why I sent a base file which had the conditional probabilities, the mapping, and the values to be able to compute marginals. About the URIs, I should have added in my email that because freebase types are not URIs, and have types such as /people/person, we added a base URI: http://freebase.com to the types. Sorry I missed mentioning that... Kavitha Kavitha, If you apply the proper URIs, and then apply fixes to the mappings (from prior suggestions) we are set. You can send me another dump and I will go one step further and put some sample SPARQL queries together which demonstrate how we can have many world views on the Web of Linked Data without anyone getting hurt in the process :-) Kingsley On Aug 10, 2009, at 4:42 PM, Tim Finin wrote: Kavitha Srinivas wrote: I understand what you are saying -- but some of this reflects the way types are associated with freebase instances. The types are more like 'tags' in the sense that there is no hierarchy, but each instance is annotated with multiple types. So an artist would in fact be annotated with person reliably (and probably less consistently with /music/artist). Similar issues with Uyhurs, murdered children etc. The issue is differences in modeling granularity as well. Perhaps a better thing to look at are types where the YAGO types map to Wordnet (this is usually at a coarser level of granularity). One way to approach this problem is to use a framework to mix logical constraints with probabilistic ones. My colleague Yun Peng has been exploring integrating data backed by OWL ontologies with Bayesian information, with applications for ontology mapping. See [1] for recent papers on this as well as a recent PhD thesis [2] that I think also may be relevant. [1] http://ebiquity.umbc.edu/papers/select/search/html/613a353a7b693a303b643a373 83 b693a313b643a303b693a323b733a303a3b693a333b733a303a3b693a343b643a303 b7 d/ [2] http://ebiquity.umbc.edu/paper/html/id/427/Constraint-Generation-and- Reasoning-in-OWL -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web: http://www.openlinksw.com -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web: http://www.openlinksw.com
Re: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval
Hi Kingsley, hi Kavitha, we will generate a new dump covering all YAGO Freebase relations with at least 95% probability, using skos:narrower as proposed by Tim. I will let you know, when the file is available for download. Cheers, Anja Kingsley Idehen schrieb: Kavitha Srinivas wrote: I will fix the URIs.. I believe the equivalenceClass assertions were added in by someone at OpenLink (I just sent the raw file with the conditional probabilities for each pair of types that were above the .80 threshold). So can whoever uploaded the file fix the property to what Tim suggested? Hmm, I didn't touch the file, neither did anyone else at OpenLink. I just downloaded what was uploaded at: http://wiki.dbpedia.org/Downloads33, any based on my own personal best practices, put the data in a separate Named Graph :-) Chris/Anja: I believe this data set was touched on your end, right? Please make the fixes in line with the findings from the conversation on this thread. Once corrected, I or someone else will reload. Kingsley Thanks! Kavitha On Aug 10, 2009, at 5:03 PM, Kingsley Idehen wrote: Kavitha Srinivas wrote: Agree completely -- which is why I sent a base file which had the conditional probabilities, the mapping, and the values to be able to compute marginals. About the URIs, I should have added in my email that because freebase types are not URIs, and have types such as /people/person, we added a base URI: http://freebase.com to the types. Sorry I missed mentioning that... Kavitha Kavitha, If you apply the proper URIs, and then apply fixes to the mappings (from prior suggestions) we are set. You can send me another dump and I will go one step further and put some sample SPARQL queries together which demonstrate how we can have many world views on the Web of Linked Data without anyone getting hurt in the process :-) Kingsley On Aug 10, 2009, at 4:42 PM, Tim Finin wrote: Kavitha Srinivas wrote: I understand what you are saying -- but some of this reflects the way types are associated with freebase instances. The types are more like 'tags' in the sense that there is no hierarchy, but each instance is annotated with multiple types. So an artist would in fact be annotated with person reliably (and probably less consistently with /music/artist). Similar issues with Uyhurs, murdered children etc. The issue is differences in modeling granularity as well. Perhaps a better thing to look at are types where the YAGO types map to Wordnet (this is usually at a coarser level of granularity). One way to approach this problem is to use a framework to mix logical constraints with probabilistic ones. My colleague Yun Peng has been exploring integrating data backed by OWL ontologies with Bayesian information, with applications for ontology mapping. See [1] for recent papers on this as well as a recent PhD thesis [2] that I think also may be relevant. [1] http://ebiquity.umbc.edu/papers/select/search/html/613a353a7b693a303b643a37383b693a313b643a303b693a323b733a303a3b693a333b733a303a3b693a343b643a303b7d/ [2] http://ebiquity.umbc.edu/paper/html/id/427/Constraint-Generation-and-Reasoning-in-OWL -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web: http://www.openlinksw.com
Re: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval
Kavitha Srinivas wrote: Actually I wonder if it makes sense to annotate the relation with the actual conditional probability as suggested by Tim. We found the probabilities quite useful in different applications. If we go this route we could also send you the set of types with very low probabilities -- which is very useful if you want to know for instance that an instance is almost never both a Car and a Person. Publishing the conditional probabilities available in some form (not necessarily RDF) is a great idea. People could use this data in many ways, I think. It's less clear to me how this might be integrated into DBpedia or the LOD cloud. But having the data available should facilitate experimentation with how to best use it.
Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval
On Aug 11, 2009, at 5:45 AM, Chris Bizer wrote: Hi Kingsley, Pat and all, Chris/Anja: I believe this data set was touched on your end, right? Yes, Anja will fix the file and will send an updated version. Thanks. Pat Hayes wrote: This website should be taken down immediately, before it does serious harm. It is irresponsible to publish such off-the-wall equivalentClass assertions. Pat: Your comment seems to imply that you see the Semantic Web as something consistent that can be broken by individual information providers publishing false information. If this is the case, the Semantic Web will never fly! Agreed, but surely we can expect something better than this. We will of course need to have ways (not yet elucidated) of locating the sources of inconsistencies and correcting or avoiding them. In the meantime, many of us are worrying about how to achieve mutual consistency between rival high-level ontologies. Everything on the Web is a claim by somebody. There are no facts, there is no truth, there are only opinions. Same is true of the Web and of life in general, but still there are laws about slander, etc.; and outrageous falsehoods are rebutted or corrected (eg look at how Wikipedia is managed); or else their source is widely treated as nonsensical, which I hardly think DBpedia wishes to be. And also, I think we do have some greater responsibility to give our poor dumb inference engines a helping hand, since they have no common sense to help them sort out the wheat from the chaff, unlike our enlightened human selves. Semantic Web applications must take this into account and therefore always assess data quality and trustworthiness before they do something with the data. In a perfect world, but in practice this isn't possible. There are no criteria yet available for making such judgements, or even for locating the true source of a discovered inconsistency. About the only way to to do it is to judge the veracity of the source; and if one cannot trust DBpedia to not say blatant falsehoods, who can you trust? And I would draw a distinction between what one might call fact-level disagreements (about the population of India, say) and high- or mid- level problems, which are much harder to deal with. Introducing gratuitous, wildly false, claims into the upper middle levels of a class hierarchy is liable to produce a very large number of inconsistencies down the line which will be very hard to identify and very hard to correct. They may appear as apparent errors in instance data, for example. If you build applications that brake once somebody publishes false information, you are obviously doomed. Of course, but there are degrees of falsehood. To assert that hundreds of dissimilar, mid-level ontological categories are all identical is the most egregious kind of falsehood. In fact its not really a falsehood: it was simply a mistake. Nobody actually thought these classes were equal in extent, not for a second. They just didn't know, or perhaps didn't care, what 'equivalentClass' means. Hence my rather strongly worded protest. The subtext was: please understand, and pay attention to, what the relations in your assertions mean. They are not just vague links in a vaguely defined associative network. But in any case, thanks to the workers for the rapid repair response. Pat As I thought this would be generally understood, I'm very surprised by your comment. Cheers, Chris -Ursprüngliche Nachricht- Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im Auftrag von Kingsley Idehen Gesendet: Montag, 10. August 2009 23:29 An: Kavitha Srinivas Cc: Tim Finin; Anja Jentzsch; public-lod@w3.org; dbpedia- discuss...@lists.sourceforge.net; Chris Bizer Betreff: Re: [Dbpedia-discussion] Fwd: Your message to Dbpedia- discussion awaits moderator approval Kavitha Srinivas wrote: I will fix the URIs.. I believe the equivalenceClass assertions were added in by someone at OpenLink (I just sent the raw file with the conditional probabilities for each pair of types that were above the .80 threshold). So can whoever uploaded the file fix the property to what Tim suggested? Hmm, I didn't touch the file, neither did anyone else at OpenLink. I just downloaded what was uploaded at: http://wiki.dbpedia.org/Downloads33, any based on my own personal best practices, put the data in a separate Named Graph :-) Chris/Anja: I believe this data set was touched on your end, right? Please make the fixes in line with the findings from the conversation on this thread. Once corrected, I or someone else will reload. Kingsley Thanks! Kavitha On Aug 10, 2009, at 5:03 PM, Kingsley Idehen wrote: Kavitha Srinivas wrote: Agree completely -- which is why I sent a base file which had the conditional probabilities, the mapping, and the values to be able to compute marginals. About the URIs, I
Re: [HELP] Can you please update information about your dataset?
Hi, I've just added several new datasets to the Statistics page that weren't previously listed. Its not really a great user experience editing the wiki markup and manually adding up the figures. So, thinking out loud, I'm wondering whether it might be more appropriate to use a Google spreadsheet and one of their submission forms for the purposes of collectively the data. A little manual editing to remove duplicates might make managing this data a little more easier. Especially as there are also pages that separately list the available SPARQL endpoints and RDF dumps. I'm sure we could create something much better using Void, etc but for now, maybe using a slightly better tool would give us a little more progress? It'd be a snip to dump out the Google Spreadsheet data programmatically too, which'd be another improvement on the current situation. What does everyone else think? Cheers, L. 2009/8/7 Jun Zhao jun.z...@zoo.ox.ac.uk: Dear all, We are planning to produce an updated data cloud diagram based on the dataset information on the esw wiki page: http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics If you have not published your dataset there yet and you would like your dataset to be included, can you please add your dataset there? If you have an entry there for your dataset already, can you please update information about your dataset on the wiki? If you cannot edit the wiki page any more because the recent update of esw wiki editing policy, you can send the information to me or Anja, who is cc'ed. We can update it for you. If you know your friends have dataset on the wiki, but are not on the mailing list, can you please kindly forward this email to them? We would like to get the data cloud as up-to-date as possible. For this release, we will use the above wiki page as the information gathering point. We do apologize if you have published information about your dataset on other web pages and this request would mean extra work for you. Many thanks for your contributions! Kindest regards, Jun __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: ANN: D2R Server and D2RQ V0.7 released
Christian Becker wrote: Hi all, we are happy to announce the release of D2R Server and D2RQ Version 0.7 and recommend all users to replace old installations with the new release. Version 0.7 provides: - Several bugfixes - Support for Microsoft SQL Server - Support for dynamic properties (by Jörg Henß) - Support for limits on property bridge level (by Matthias Quasthoff) - Better dump performance - New optimizations that must be enabled using D2R Server's --fast switch or using d2rq:useAllOptimizations More information about the tools is found on the 1. D2RQ Platform website: http://www4.wiwiss.fu-berlin.de/bizer/d2rq/ 2. D2R Server website: http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/ The new releases can be downloaded from Sourceforge http://sourceforge.net/projects/d2rq-map/ Lots of thanks for their magnificent work to: - Andreas Langegger and Herwig Leimer for continued improvements of the D2RQ engine - Richard Cyganiak for input on several design issues - Jörg Henß for adding support for dynamic properties (by Jörg Henß) - Matthias Quasthoff for adding limit support on property bridge level (d2rq:limit, d2rq:limitInverse, d2rq:orderAsc and d2rq:orderDesc) - Alistair Miles for patch for SQL cursor support Cheers, Christian Chris, Why isn't D2R JDBC and/or ODBC based, in a generic sense? Both APIs provide enough Metadata oriented APIs for enabling a more RDBMS agnostic variant of D2R. As I am sure you can imagine, my hair stands (literally) whenever I encounter RDBMS specific client apps. :-) -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web: http://www.openlinksw.com
Re: ANN: D2R Server and D2RQ V0.7 released
Kingsley, On 11 Aug 2009, at 18:45, Kingsley Idehen wrote: Why isn't D2R JDBC and/or ODBC based, in a generic sense? Both APIs provide enough Metadata oriented APIs for enabling a more RDBMS agnostic variant of D2R. Glad to inform you that D2RQ is JDBC based, and has been since 2004. Best, Richard As I am sure you can imagine, my hair stands (literally) whenever I encounter RDBMS specific client apps. :-) -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web: http://www.openlinksw.com
Re: [HELP] Can you please update information about your dataset?
Hi, On Aug 11, 2009, at 13:46, Kingsley Idehen kide...@openlinksw.com wrote: Leigh Dodds wrote: Hi, I've just added several new datasets to the Statistics page that weren't previously listed. Its not really a great user experience editing the wiki markup and manually adding up the figures. So, thinking out loud, I'm wondering whether it might be more appropriate to use a Google spreadsheet and one of their submission forms for the purposes of collectively the data. A little manual editing to remove duplicates might make managing this data a little more easier. Especially as there are also pages that separately list the available SPARQL endpoints and RDF dumps. I'm sure we could create something much better using Void, etc but for now, maybe using a slightly better tool would give us a little more progress? It'd be a snip to dump out the Google Spreadsheet data programmatically too, which'd be another improvement on the current situation. What does everyone else think? Nice Idea! Especially as Google Spreadsheet to RDF is just about RDFizers for the Google Spreadsheet API :-) Hehe. I have this in my todo (literally). A website that exposes a google spreadsheet as SPARQL endpoint. Internally we use it as UI to quickly create config files et Al. But It will remain in my todo forever...;) Kingsley, this could be sponged. The trick is that the spreadsheet must have an accompanying page/sheet/book with metadata (the NS or explicit URIs for cols). Kingsley Cheers, L. 2009/8/7 Jun Zhao jun.z...@zoo.ox.ac.uk: Dear all, We are planning to produce an updated data cloud diagram based on the dataset information on the esw wiki page: http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics If you have not published your dataset there yet and you would like your dataset to be included, can you please add your dataset there? If you have an entry there for your dataset already, can you please update information about your dataset on the wiki? If you cannot edit the wiki page any more because the recent update of esw wiki editing policy, you can send the information to me or Anja, who is cc'ed. We can update it for you. If you know your friends have dataset on the wiki, but are not on the mailing list, can you please kindly forward this email to them? We would like to get the data cloud as up-to-date as possible. For this release, we will use the above wiki page as the information gathering point. We do apologize if you have published information about your dataset on other web pages and this request would mean extra work for you. Many thanks for your contributions! Kindest regards, Jun __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web: http://www.openlinksw.com
Re: [HELP] Can you please update information about your dataset?
Hi Michael, I have taken this dataset off the list. I also believe that Yves has managed to update the record about BBC music thanks to Michael H.:) cheers, Jun Michael Smethurst wrote: Hi Jun/all Just noticed the line on: http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics saying: BBC Later + TOTP (link not responding - 2009-04-01) That's my bad. The site's been down since we forgot to pay our ec2 bills :-/ Having said that the data has either moved or is in the process of moving to BBC programmes and BBC music so TOTP/Later should probably come off the cloud piccie and off the list and BBC Music should be added in linking to musicbrainz and bbc programmes. The TOTP/Later site was only ever intended as a try out and a demo to management types at the BBC. Strangely it seems to have worked... :-) Also to note that BBC programmes and music do now have a joint sparql endpoint - well 2 in fact: http://api.talis.com/stores/bbc-backstage http://bbc.openlinksw.com/sparql Sorry for not notifying earlier Michael -Original Message- From: public-lod-requ...@w3.org on behalf of Jun Zhao Sent: Fri 8/7/2009 7:05 PM To: public-lod@w3.org Cc: Anja Jentzsch Subject: [HELP] Can you please update information about your dataset? Dear all, We are planning to produce an updated data cloud diagram based on the dataset information on the esw wiki page: http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics If you have not published your dataset there yet and you would like your dataset to be included, can you please add your dataset there? If you have an entry there for your dataset already, can you please update information about your dataset on the wiki? If you cannot edit the wiki page any more because the recent update of esw wiki editing policy, you can send the information to me or Anja, who is cc'ed. We can update it for you. If you know your friends have dataset on the wiki, but are not on the mailing list, can you please kindly forward this email to them? We would like to get the data cloud as up-to-date as possible. For this release, we will use the above wiki page as the information gathering point. We do apologize if you have published information about your dataset on other web pages and this request would mean extra work for you. Many thanks for your contributions! Kindest regards, Jun http://www.bbc.co.uk This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: [HELP] Can you please update information about your dataset?
Please no! Not another manual entry system. I had already decided I just haven't got the time to manually maintain this constantly changing set of numbers, so would not be responding to the request to update. (In fact, the number of different places that a good LD citizen has to put their data into the esw wiki is really rather high.) Last time Anja was kind enough to put a lot of effort into processing the graphviz for us to generate the numbers, but this is not the way to do it. In our case, we have 39 different stores, with linkages between them and to others outside. There are therefore 504 numbers to represent the linkage, although they don't all meet a threshold. For details of the linkage in rkbexplorer see pictures at http://www.rkbexplorer.com/linkage/ or query http://void.rkbexplorer.com/ . And these figures are constantly changing, as the system identifies more - there can be more than 1000 a day. If any more work is to be put into generating this picture, it really should be from voiD descriptions, which we already make available for all our datasets. And for those who want to do it by hand, a simple system to allow them to specify the linkage using voiD would get the entry into a format for the voiD processor to use (I'm happy to host the data if need be). Or Aldo's system could generate its RDF using the voiD ontology, thus providing the manual entry system? I know we have been here before, and almost got to the voiD processor thing:- please can we try again? Best Hugh On 11/08/2009 19:00, Aldo Bucchi aldo.buc...@gmail.com wrote: Hi, On Aug 11, 2009, at 13:46, Kingsley Idehen kide...@openlinksw.com wrote: Leigh Dodds wrote: Hi, I've just added several new datasets to the Statistics page that weren't previously listed. Its not really a great user experience editing the wiki markup and manually adding up the figures. So, thinking out loud, I'm wondering whether it might be more appropriate to use a Google spreadsheet and one of their submission forms for the purposes of collectively the data. A little manual editing to remove duplicates might make managing this data a little more easier. Especially as there are also pages that separately list the available SPARQL endpoints and RDF dumps. I'm sure we could create something much better using Void, etc but for now, maybe using a slightly better tool would give us a little more progress? It'd be a snip to dump out the Google Spreadsheet data programmatically too, which'd be another improvement on the current situation. What does everyone else think? Nice Idea! Especially as Google Spreadsheet to RDF is just about RDFizers for the Google Spreadsheet API :-) Hehe. I have this in my todo (literally). A website that exposes a google spreadsheet as SPARQL endpoint. Internally we use it as UI to quickly create config files et Al. But It will remain in my todo forever...;) Kingsley, this could be sponged. The trick is that the spreadsheet must have an accompanying page/sheet/book with metadata (the NS or explicit URIs for cols). Kingsley Cheers, L. 2009/8/7 Jun Zhao jun.z...@zoo.ox.ac.uk: Dear all, We are planning to produce an updated data cloud diagram based on the dataset information on the esw wiki page: http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics If you have not published your dataset there yet and you would like your dataset to be included, can you please add your dataset there? If you have an entry there for your dataset already, can you please update information about your dataset on the wiki? If you cannot edit the wiki page any more because the recent update of esw wiki editing policy, you can send the information to me or Anja, who is cc'ed. We can update it for you. If you know your friends have dataset on the wiki, but are not on the mailing list, can you please kindly forward this email to them? We would like to get the data cloud as up-to-date as possible. For this release, we will use the above wiki page as the information gathering point. We do apologize if you have published information about your dataset on other web pages and this request would mean extra work for you. Many thanks for your contributions! Kindest regards, Jun __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web: http://www.openlinksw.com
Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval
On 11/08/2009 15:47, Pat Hayes pha...@ihmc.us wrote: On Aug 11, 2009, at 5:45 AM, Chris Bizer wrote: Hi Kingsley, Pat and all, snip/ Everything on the Web is a claim by somebody. There are no facts, there is no truth, there are only opinions. Same is true of the Web and of life in general, but still there are laws about slander, etc.; and outrageous falsehoods are rebutted or corrected (eg look at how Wikipedia is managed); or else their source is widely treated as nonsensical, which I hardly think DBpedia wishes to be. And also, I think we do have some greater responsibility to give our poor dumb inference engines a helping hand, since they have no common sense to help them sort out the wheat from the chaff, unlike our enlightened human selves. Semantic Web applications must take this into account and therefore always assess data quality and trustworthiness before they do something with the data. I think that this discussion really emphasises how bad it is to put this co-ref data in the same store as the other data. Finding data in dbpedia that is mistaken/wrong/debateable undermines the whole project - the contract dbpedia offers is to reflect the wikipedia content that it offers. And it isn't really sensible/possible to distinguish the extra sameas from the real sameas. Eg http://dbpedia.org/resource/London and http://dbpedia.org/resource/Leondeon And on the other hand, freebase is now in danger of being undermined by this as well. As time goes by, the more I think this is going wrong. Best Hugh truncate/
Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval
Hi Kingsley. On 12/08/2009 00:28, Kingsley Idehen kide...@openlinksw.com wrote: Hugh Glaser wrote: On 11/08/2009 15:47, Pat Hayes pha...@ihmc.us wrote: On Aug 11, 2009, at 5:45 AM, Chris Bizer wrote: Hi Kingsley, Pat and all, snip/ Everything on the Web is a claim by somebody. There are no facts, there is no truth, there are only opinions. Same is true of the Web and of life in general, but still there are laws about slander, etc.; and outrageous falsehoods are rebutted or corrected (eg look at how Wikipedia is managed); or else their source is widely treated as nonsensical, which I hardly think DBpedia wishes to be. And also, I think we do have some greater responsibility to give our poor dumb inference engines a helping hand, since they have no common sense to help them sort out the wheat from the chaff, unlike our enlightened human selves. Semantic Web applications must take this into account and therefore always assess data quality and trustworthiness before they do something with the data. I think that this discussion really emphasises how bad it is to put this co-ref data in the same store as the other data. Yes, they should be in distinct Named Graphs. I thought you would mention Named Graphs :-) This is the point I was making a while back (in relation to Alan's comments about the same thing). Yes, but this is the point I was making a while back about Named Graphs as a solution - when I resolve a URI (follow-my-nose) in the recommended fashion, I see no Named Graphs - they are only exposed in SPARQL stores. If I resolve http://dbpedia.org/resource/London to get http://dbpedia.org/data/London.rdf I see a bunch of RDF - go on, try it. No sight of Named Graphs. Are you saying that the only way to access Linked Data is via SPARQL? Finding data in dbpedia that is mistaken/wrong/debateable undermines the whole project - the contract dbpedia offers is to reflect the wikipedia content that it offers. Er. its prime contract is a Name Corpus. In due course there will be lots of meshes from other domains Linked Data contributors e.g. BBC, Reuters, New York Times etc.. I really don't think so. Its prime contract is that I can resolve a URI for a NIR and get back things like Description, Location, etc.. If it gives me dodgy other stuff that I can't distinguish, I will have to stop using it, which would be a disaster. The goal of DBpedia was to set the ball rolling and in that regard its over achieved (albeit from my very biased view point). Oh yes! - but let's not let it get spoilt. Perfection is not an option on the Web or in the real world. We exist in a continuum that is inherently buggy, by design (otherwise it would be very boring). When we engineer things we accept all that - but what we then do is engineer systems so that they are robust to the imperfections. And it isn't really sensible/possible to distinguish the extra sameas from the real sameas. Eg http://dbpedia.org/resource/London and http://dbpedia.org/resource/Leondeon Sorry, I was wrong about these two being sameAs - they are dbpprop:redirect, although I don't think that it changes the story. Actually, in fact dbpprop:redirect may be a sub-property of owl:sameAs for all I know. (I think the URIs for http://dbpedia.org/property/ and http://dbpedia.org/ontology/ need fixing :-) ) I had inferred they were sameAs, since they sameAs yago or fbase stuff, which then get sameAs elsewhere. And on the other hand, freebase is now in danger of being undermined by this as well. As time goes by, the more I think this is going wrong. I think the complete opposite. We just need the traditional media players to comprehend that: Data is like Wine and Code is like Fish. Once understood, they will realize that the Web has simply introduced a medium of value exchange inflection i.e., the HTTP URI as opposed to URL (which replicates paper). Note, every media company is high quality Linked Data Space curator in disguise, they just need to understand what the Web really offers :-) By this, I meant putting the co-reffing (sameAs) links in the RDF that is returned with the data about the NIR when a URI is resolved. Best Hugh truncate/ -- Regards, Cheers Hugh Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President CEO OpenLink Software Web: http://www.openlinksw.com
Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval
2009/8/12 Hugh Glaser h...@ecs.soton.ac.uk: Are you saying that the only way to access Linked Data is via SPARQL? That is going a bit far, but in the end if you want to allow people to extend the model it has to be done using SPARQL. If the extension is taken well by users then it could be included in what is resolved for the URI but that doesn't mean it is not Linked Data up until the point it is included. I for one loved the recent addition of the Page Links set in a separate Named Graph, and I don't see how this is different. Cheers, Peter
Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval
Dear Peter, Thank you for your comments, which I think raise the main issues. On 12/08/2009 01:11, Peter Ansell ansell.pe...@gmail.com wrote: 2009/8/12 Hugh Glaser h...@ecs.soton.ac.uk: Are you saying that the only way to access Linked Data is via SPARQL? That is going a bit far, but in the end if you want to allow people to extend the model it has to be done using SPARQL. If the extension is taken well by users then it could be included in what is resolved for the URI but that doesn't mean it is not Linked Data up until the point it is included. My view is that if you need to extend (I would say step outside) the model, then something is broken. Or at least it is broken until the model includes the extension, as you suggest. So we need to work out how to include such extensions in the model, if such a thing is desirable. Did I go too far? I'm not sure. I have a sense that the suggested solution to any problem I raise is Oh don't worry, just use a Named Graph. But How to Publish Linked Data on the Web (http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/), which really is an excellent description of what I think should be happening, makes no real mention of the idea that a SPARQL endpoint might be associated with Linked Data. In fact, it says that if you have a SPARQL endpoint (for example using D2R), you might use Pubby as a Linked Data interface in front of your SPARQL endpoint. And pubby says: Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server. I infer from this that SPARQL endpoints are optional extras when publishing Linked Data. So any solutions to problems must work simply by resolving URIs. I for one loved the recent addition of the Page Links set in a separate Named Graph, and I don't see how this is different. That's great. I'd be interested to know how you make use of them? We find it very hard to make use of Named Graph data. All we start with is a URI for a NIR; so all we can do is resolve it. We cache the resulting RDF and then use it for analysis and fresnel rendering. It is pretty hard to build in anything that takes any notice of Named Graphs at arbitrary Linked Data sites. We would need to be able to find the SPARQL endpoint from a URI so that we can do the DESCRIBE, and then also be able to specify a Named Graph to go with it. In fact, how would I do that from http://dbpedia.org/resource/London ? I'm afraid I find Linked Data (by resolving URIs) really beautiful, and think I can understand how I and others might use it. So when it is suggested that the way to solve an issue with how it works is to step outside the RDFramework, I think it needs to be challenged or brought into the Framework. Cheers, Peter Cheers Peter. Hope that helps to show where I come from. Best Hugh
Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval
2009/8/12 Hugh Glaser h...@ecs.soton.ac.uk: Dear Peter, Thank you for your comments, which I think raise the main issues. On 12/08/2009 01:11, Peter Ansell ansell.pe...@gmail.com wrote: 2009/8/12 Hugh Glaser h...@ecs.soton.ac.uk: Are you saying that the only way to access Linked Data is via SPARQL? That is going a bit far, but in the end if you want to allow people to extend the model it has to be done using SPARQL. If the extension is taken well by users then it could be included in what is resolved for the URI but that doesn't mean it is not Linked Data up until the point it is included. My view is that if you need to extend (I would say step outside) the model, then something is broken. Or at least it is broken until the model includes the extension, as you suggest. So we need to work out how to include such extensions in the model, if such a thing is desirable. By extend I meant extend the information pool, and not necessarily extend the protocol, which should still work with some suggestions I make below. I definitely think extensions are useful, although they may need to appear with different URI's to the accepted set of information pieces that have been published and recognised as the minimal set by the original author. Did I go too far? I'm not sure. I have a sense that the suggested solution to any problem I raise is Oh don't worry, just use a Named Graph. But How to Publish Linked Data on the Web (http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/), which really is an excellent description of what I think should be happening, makes no real mention of the idea that a SPARQL endpoint might be associated with Linked Data. In fact, it says that if you have a SPARQL endpoint (for example using D2R), you might use Pubby as a Linked Data interface in front of your SPARQL endpoint. And pubby says: Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server. I infer from this that SPARQL endpoints are optional extras when publishing Linked Data. So any solutions to problems must work simply by resolving URIs. I have a very similar approach to this with the Bio2RDF server, but I am using multiple SPARQL endpoints to provide resolution for URI's. I use the ability to get information by either URI resolution or SPARQL endpoints to create extended versions. SPARQL endpoints should be optional, but encouraged IMO, so people can pick and choose without having to transfer everything across the wire every time they access the information if they want to optimise their applications. I for one loved the recent addition of the Page Links set in a separate Named Graph, and I don't see how this is different. That's great. I'd be interested to know how you make use of them? We find it very hard to make use of Named Graph data. All we start with is a URI for a NIR; so all we can do is resolve it. We cache the resulting RDF and then use it for analysis and fresnel rendering. It is pretty hard to build in anything that takes any notice of Named Graphs at arbitrary Linked Data sites. We would need to be able to find the SPARQL endpoint from a URI so that we can do the DESCRIBE, and then also be able to specify a Named Graph to go with it. In fact, how would I do that from http://dbpedia.org/resource/London ? In short it is difficult, but not impossible if you are aware that there is some extra information that you want to include for your users that doesn't come from the URI resolution. I have been working on a system that can take notice of Named Graphs, but it doesn't work with arbitrary URI's as it requires the URI's to be normalised to some scheme that the software recognises. For instance, the normalised form of http://dbpedia.org/resource/London in my system is http://domain.name/dbpedia:London;, with the domain.name being specified by the user. By design it doesn't fit with the notion that URI's are opaque and shouldn't be modified, but it is hard to deny that it works. Resolving http://qut.bio2rdf.org/dbpedia:London for instance will include the PageLinks set along with any extensions that Matthias Samwald has included to link OBO to DBpedia (although in this case it is unlikely any would exist in this set) and some links that the DrugBank LODD project provide using their dataset in relation to DBpedia resources. If you want to know exactly which datasets would be resolved there is a URI for that... http://qut.bio2rdf.org/queryplan/dbpedia:London In some ways it isn't really typical Linked Data, but it allows the distributed extensions that I think people really want access to in some cases. I'm afraid I find Linked Data (by resolving URIs) really beautiful, and think I can understand how I and others might use it. So when it is suggested that the way to solve an issue with how it works is to step outside the RDFramework, I think it needs to be challenged or brought into the Framework. One way you could do it could be by
Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval
Hugh Glaser wrote: Hi Kingsley. On 12/08/2009 00:28, Kingsley Idehen kide...@openlinksw.com wrote: Hugh Glaser wrote: On 11/08/2009 15:47, Pat Hayes pha...@ihmc.us wrote: On Aug 11, 2009, at 5:45 AM, Chris Bizer wrote: Hi Kingsley, Pat and all, snip/ Everything on the Web is a claim by somebody. There are no facts, there is no truth, there are only opinions. Same is true of the Web and of life in general, but still there are laws about slander, etc.; and outrageous falsehoods are rebutted or corrected (eg look at how Wikipedia is managed); or else their source is widely treated as nonsensical, which I hardly think DBpedia wishes to be. And also, I think we do have some greater responsibility to give our poor dumb inference engines a helping hand, since they have no common sense to help them sort out the wheat from the chaff, unlike our enlightened human selves. Semantic Web applications must take this into account and therefore always assess data quality and trustworthiness before they do something with the data. I think that this discussion really emphasises how bad it is to put this co-ref data in the same store as the other data. Yes, they should be in distinct Named Graphs. I thought you would mention Named Graphs :-) This is the point I was making a while back (in relation to Alan's comments about the same thing). Yes, but this is the point I was making a while back about Named Graphs as a solution - when I resolve a URI (follow-my-nose) in the recommended fashion, I see no Named Graphs - they are only exposed in SPARQL stores. If I resolve http://dbpedia.org/resource/London to get http://dbpedia.org/data/London.rdf I see a bunch of RDF - go on, try it. No sight of Named Graphs. Correct, but the publisher of the Linked Data is putting HTTP URIs in front of the content of a Quad Store. These URIs are associated with SPARQL queries (in the case of DBpedia).With regards to the great example from yesterday, I deliberately put out two different views to demonstrate that you can partition data and not break the graph traversal desired by the follow-your-nose data exploration and discovery pattern. But note, and this is very important, the follow-your-nose pattern doesn't eradicate the fact that cul-de-sacs and T-junctions will also be part the Web of Linked Data. Are you saying that the only way to access Linked Data is via SPARQL? Finding data in dbpedia that is mistaken/wrong/debateable undermines the whole project - the contract dbpedia offers is to reflect the wikipedia content that it offers. Er. its prime contract is a Name Corpus. In due course there will be lots of meshes from other domains Linked Data contributors e.g. BBC, Reuters, New York Times etc.. I really don't think so. In my world view contract doesn't imply sole use or potential :-) Its prime contract is that I can resolve a URI for a NIR and get back things like Description, Location, etc.. I've written enough about HTTP URIs and their virtues [1]. Hopefully, we will forget the horrible term: NIR, really. It just about data items, their identifiers, and associated metadata. If it gives me dodgy other stuff that I can't distinguish, I will have to stop using it, which would be a disaster. The goal of DBpedia was to set the ball rolling and in that regard its over achieved (albeit from my very biased view point). Oh yes! - but let's not let it get spoilt. I really believe you are overreacting here. Ironically, you seem to have missed the trivial manner in which this data set was erased without any effect on DBpedia URIs whatsoever. Even at the time the data was loaded, you wouldn't have been able to de-reference this data from DBpedia URIs (back to the Named Graph issue above and follow-your-nose) since the SPARQL that generates the metadata for DBpedia's HTTP URIs is explicitly scoped to Graph IRI: http://dbpedia.org . Remember, this linkset was basically a set of axioms that could have been used solely for backward chained reasoning via SPARQL pragmas. Said SPARQL could even be used as basis for a different set of HTTP URIs that point to the DBpedia ones (without explicit inverse triples in the DBpedia graph and the link property doesn't have to one that's symmetrical). Perfection is not an option on the Web or in the real world. We exist in a continuum that is inherently buggy, by design (otherwise it would be very boring). When we engineer things we accept all that - but what we then do is engineer systems so that they are robust to the imperfections. Sure re. robustness, but ironically you don't quite see the robustness and dexterity this whole episode has unveiled re. community discourse and rapid resolution etc.. We would have had a little problem if the data had been loaded into the DBpedia Named