AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval

2009-08-11 Thread Chris Bizer

Hi Kingsley, Pat and all,

 Chris/Anja: I believe this data set was touched on your end, right?

Yes, Anja will fix the file and will send an updated version.

Pat Hayes wrote:
 This website should be taken down immediately, before it does serious 
 harm. It is irresponsible to publish such off-the-wall equivalentClass 
 assertions.

Pat: Your comment seems to imply that you see the Semantic Web as something
consistent that can be broken by individual information providers publishing
false information. If this is the case, the Semantic Web will never fly!

Everything on the Web is a claim by somebody. There are no facts, there is
no truth, there are only opinions.

Semantic Web applications must take this into account and therefore always
assess data quality and trustworthiness before they do something with the
data. If you build applications that brake once somebody publishes false
information, you are obviously doomed.

As I thought this would be generally understood, I'm very surprised by your
comment.

Cheers,

Chris


 -Ursprüngliche Nachricht-
 Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im
Auftrag
 von Kingsley Idehen
 Gesendet: Montag, 10. August 2009 23:29
 An: Kavitha Srinivas
 Cc: Tim Finin; Anja Jentzsch; public-lod@w3.org; dbpedia-
 discuss...@lists.sourceforge.net; Chris Bizer
 Betreff: Re: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion
 awaits moderator approval
 
 Kavitha Srinivas wrote:
  I will fix the URIs.. I believe the equivalenceClass assertions were
  added in by someone at OpenLink (I just sent the raw file with the
  conditional probabilities for each pair of types that were above the
  .80 threshold).  So can whoever uploaded the file fix the property to
  what Tim suggested?
 Hmm,  I didn't touch the file, neither did anyone else at OpenLink. I
 just downloaded what was uploaded at:
 http://wiki.dbpedia.org/Downloads33, any based on my own personal best
 practices, put the data in a separate Named Graph :-)
 
 Chris/Anja: I believe this data set was touched on your end, right?
 Please make the fixes in line with the findings from the conversation on
 this thread. Once corrected, I or someone else will reload.
 
 Kingsley
 
  Thanks!
  Kavitha
  On Aug 10, 2009, at 5:03 PM, Kingsley Idehen wrote:
 
  Kavitha Srinivas wrote:
  Agree completely -- which is why I sent a base file which had the
  conditional probabilities, the mapping, and the values to be able to
  compute marginals.
  About the URIs, I should have added in my email that because
  freebase types are not URIs, and have types such as /people/person,
  we added a base URI: http://freebase.com to the types.  Sorry I
  missed mentioning that...
  Kavitha
  Kavitha,
 
  If you apply the proper URIs, and then apply fixes to the mappings
  (from prior suggestions) we are set.  You can send me another dump
  and I will go one step further and put some sample SPARQL queries
  together which demonstrate how we can have many world views on the
  Web of Linked Data without anyone getting hurt in the process :-)
 
  Kingsley
 
  On Aug 10, 2009, at 4:42 PM, Tim Finin wrote:
 
  Kavitha Srinivas wrote:
  I understand what you are saying -- but some of this reflects the
  way types are associated with freebase instances.  The types are
  more like 'tags' in the sense that there is no hierarchy, but each
  instance is annotated with multiple types.  So an artist would in
  fact be annotated with person reliably (and probably less
  consistently with /music/artist).  Similar issues with Uyhurs,
  murdered children etc.  The issue is differences in modeling
  granularity as well.  Perhaps a better thing to look at are types
  where the YAGO types map to Wordnet (this is usually at a coarser
  level of granularity).
 
  One way to approach this problem is to use a framework to mix logical
  constraints with probabilistic ones.  My colleague Yun Peng has been
  exploring integrating data backed by OWL ontologies with Bayesian
  information,
  with applications for ontology mapping.  See [1] for recent papers
  on this
  as well as a recent PhD thesis [2] that I think also may be relevant.
 
  [1]
 

http://ebiquity.umbc.edu/papers/select/search/html/613a353a7b693a303b643a373
83

b693a313b643a303b693a323b733a303a3b693a333b733a303a3b693a343b643a303
b7
 d/
 
  [2]
  http://ebiquity.umbc.edu/paper/html/id/427/Constraint-Generation-and-
 Reasoning-in-OWL
 
 
 
 
 
 
  --
 
 
  Regards,
 
  Kingsley Idehen  Weblog:
http://www.openlinksw.com/blog/~kidehen
  President  CEO OpenLink Software Web: http://www.openlinksw.com
 
 
 
 
 
 
 
 
 --
 
 
 Regards,
 
 Kingsley Idehen Weblog:
http://www.openlinksw.com/blog/~kidehen
 President  CEO
 OpenLink Software Web: http://www.openlinksw.com
 
 
 





Re: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval

2009-08-11 Thread Anja Jentzsch

Hi Kingsley, hi Kavitha,

we will generate a new dump covering all YAGO Freebase relations with at 
least 95% probability, using skos:narrower as proposed by Tim.

I will let you know, when the file is available for download.

Cheers,
Anja

Kingsley Idehen schrieb:

Kavitha Srinivas wrote:
I will fix the URIs.. I believe the equivalenceClass assertions were 
added in by someone at OpenLink (I just sent the raw file with the 
conditional probabilities for each pair of types that were above the 
.80 threshold).  So can whoever uploaded the file fix the property to 
what Tim suggested?
Hmm,  I didn't touch the file, neither did anyone else at OpenLink. I 
just downloaded what was uploaded at: 
http://wiki.dbpedia.org/Downloads33, any based on my own personal best 
practices, put the data in a separate Named Graph :-)


Chris/Anja: I believe this data set was touched on your end, right? 
Please make the fixes in line with the findings from the conversation on 
this thread. Once corrected, I or someone else will reload.


Kingsley


Thanks!
Kavitha
On Aug 10, 2009, at 5:03 PM, Kingsley Idehen wrote:


Kavitha Srinivas wrote:
Agree completely -- which is why I sent a base file which had the 
conditional probabilities, the mapping, and the values to be able to 
compute marginals.
About the URIs, I should have added in my email that because 
freebase types are not URIs, and have types such as /people/person, 
we added a base URI: http://freebase.com to the types.  Sorry I 
missed mentioning that...

Kavitha

Kavitha,

If you apply the proper URIs, and then apply fixes to the mappings 
(from prior suggestions) we are set.  You can send me another dump 
and I will go one step further and put some sample SPARQL queries 
together which demonstrate how we can have many world views on the 
Web of Linked Data without anyone getting hurt in the process :-)


Kingsley


On Aug 10, 2009, at 4:42 PM, Tim Finin wrote:


Kavitha Srinivas wrote:
I understand what you are saying -- but some of this reflects the 
way types are associated with freebase instances.  The types are 
more like 'tags' in the sense that there is no hierarchy, but each 
instance is annotated with multiple types.  So an artist would in 
fact be annotated with person reliably (and probably less 
consistently with /music/artist).  Similar issues with Uyhurs, 
murdered children etc.  The issue is differences in modeling 
granularity as well.  Perhaps a better thing to look at are types 
where the YAGO types map to Wordnet (this is usually at a coarser 
level of granularity).


One way to approach this problem is to use a framework to mix logical
constraints with probabilistic ones.  My colleague Yun Peng has been
exploring integrating data backed by OWL ontologies with Bayesian 
information,
with applications for ontology mapping.  See [1] for recent papers 
on this

as well as a recent PhD thesis [2] that I think also may be relevant.

[1] 
http://ebiquity.umbc.edu/papers/select/search/html/613a353a7b693a303b643a37383b693a313b643a303b693a323b733a303a3b693a333b733a303a3b693a343b643a303b7d/ 

[2] 
http://ebiquity.umbc.edu/paper/html/id/427/Constraint-Generation-and-Reasoning-in-OWL 








--


Regards,

Kingsley Idehen  Weblog: http://www.openlinksw.com/blog/~kidehen
President  CEO OpenLink Software Web: http://www.openlinksw.com















Re: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval

2009-08-11 Thread Tim Finin

Kavitha Srinivas wrote:
Actually I wonder if it makes sense to annotate the relation with the 
actual conditional probability as suggested by Tim. We found the 
probabilities quite useful in different applications. If we go this 
route we could also send you the set of types with very low 
probabilities -- which is very useful if you want to know for instance 
that an instance is almost never both a Car and a Person.


Publishing the conditional probabilities available in some form (not necessarily
RDF) is a great idea.  People could use this data in many ways, I think.  It's
less clear to me how this might be integrated into DBpedia or the LOD cloud. But
having the data available should facilitate experimentation with how to best
use it.




Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval

2009-08-11 Thread Pat Hayes


On Aug 11, 2009, at 5:45 AM, Chris Bizer wrote:



Hi Kingsley, Pat and all,


Chris/Anja: I believe this data set was touched on your end, right?


Yes, Anja will fix the file and will send an updated version.


Thanks.



Pat Hayes wrote:
This website should be taken down immediately, before it does  
serious
harm. It is irresponsible to publish such off-the-wall  
equivalentClass

assertions.


Pat: Your comment seems to imply that you see the Semantic Web as  
something
consistent that can be broken by individual information providers  
publishing
false information. If this is the case, the Semantic Web will never  
fly!


Agreed, but surely we can expect something better than this. We will  
of course need to have ways (not yet elucidated) of locating the  
sources of inconsistencies and correcting or avoiding them. In the  
meantime, many of us are worrying about how to achieve mutual  
consistency between rival high-level ontologies.




Everything on the Web is a claim by somebody. There are no facts,  
there is

no truth, there are only opinions.


Same is true of the Web and of life in general, but still there are  
laws about slander, etc.; and outrageous falsehoods are rebutted or  
corrected (eg look at how Wikipedia is managed); or else their source  
is widely treated as nonsensical, which I hardly think DBpedia wishes  
to be. And also, I think we do have some greater responsibility to  
give our poor dumb inference engines a helping hand, since they have  
no common sense to help them sort out the wheat from the chaff, unlike  
our enlightened human selves.




Semantic Web applications must take this into account and therefore  
always
assess data quality and trustworthiness before they do something  
with the

data.


In a perfect world, but in practice this isn't possible. There are no  
criteria yet available for making such judgements, or even for  
locating the true source of a discovered inconsistency. About the only  
way to to do it is to judge the veracity of the source; and if one  
cannot trust DBpedia to not say blatant falsehoods, who can you trust?  
And I would draw a distinction between what one might call fact-level  
disagreements (about the population of India, say) and high- or mid- 
level problems, which are much harder to deal with. Introducing  
gratuitous, wildly false, claims into the upper middle levels of a  
class hierarchy is liable to produce a very large number of  
inconsistencies down the line which will be very hard to identify and  
very hard to correct. They may appear as apparent errors in instance  
data, for example.



If you build applications that brake once somebody publishes false
information, you are obviously doomed.


Of course, but there are degrees of falsehood. To assert that hundreds  
of dissimilar, mid-level ontological categories are all identical is  
the most egregious kind of falsehood. In fact its not really a  
falsehood: it was simply a mistake. Nobody actually thought these  
classes were equal in extent, not for a second. They just didn't know,  
or perhaps didn't care, what 'equivalentClass' means. Hence my rather  
strongly worded protest. The subtext was: please understand, and pay  
attention to, what the relations in your assertions mean. They are not  
just vague links in a vaguely defined associative network.


But in any case, thanks to the workers for the rapid repair response.

Pat




As I thought this would be generally understood, I'm very surprised  
by your

comment.

Cheers,

Chris



-Ursprüngliche Nachricht-
Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im

Auftrag

von Kingsley Idehen
Gesendet: Montag, 10. August 2009 23:29
An: Kavitha Srinivas
Cc: Tim Finin; Anja Jentzsch; public-lod@w3.org; dbpedia-
discuss...@lists.sourceforge.net; Chris Bizer
Betreff: Re: [Dbpedia-discussion] Fwd: Your message to Dbpedia- 
discussion

awaits moderator approval

Kavitha Srinivas wrote:

I will fix the URIs.. I believe the equivalenceClass assertions were
added in by someone at OpenLink (I just sent the raw file with the
conditional probabilities for each pair of types that were above the
.80 threshold).  So can whoever uploaded the file fix the property  
to

what Tim suggested?

Hmm,  I didn't touch the file, neither did anyone else at OpenLink. I
just downloaded what was uploaded at:
http://wiki.dbpedia.org/Downloads33, any based on my own personal  
best

practices, put the data in a separate Named Graph :-)

Chris/Anja: I believe this data set was touched on your end, right?
Please make the fixes in line with the findings from the  
conversation on

this thread. Once corrected, I or someone else will reload.

Kingsley


Thanks!
Kavitha
On Aug 10, 2009, at 5:03 PM, Kingsley Idehen wrote:


Kavitha Srinivas wrote:

Agree completely -- which is why I sent a base file which had the
conditional probabilities, the mapping, and the values to be  
able to

compute marginals.
About the URIs, I 

Re: [HELP] Can you please update information about your dataset?

2009-08-11 Thread Leigh Dodds
Hi,

I've just added several new datasets to the Statistics page that
weren't previously listed. Its not really a great user experience
editing the wiki markup and manually adding up the figures.

So, thinking out loud, I'm wondering whether it might be more
appropriate to use a Google spreadsheet and one of their submission
forms for the purposes of collectively the data. A little manual
editing to remove duplicates might make managing this data a little
more easier. Especially as there are also pages that separately list
the available SPARQL endpoints and RDF dumps.

I'm sure we could create something much better using Void, etc but for
now, maybe using a slightly better tool would give us a little more
progress? It'd be a snip to dump out the Google Spreadsheet data
programmatically too, which'd be another improvement on the current
situation.

What does everyone else think?

Cheers,

L.

2009/8/7 Jun Zhao jun.z...@zoo.ox.ac.uk:
 Dear all,

 We are planning to produce an updated data cloud diagram based on the
 dataset information on the esw wiki page:
 http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics

 If you have not published your dataset there yet and you would like your
 dataset to be included, can you please add your dataset there?

 If you have an entry there for your dataset already, can you please update
 information about your dataset on the wiki?

 If you cannot edit the wiki page any more because the recent update of esw
 wiki editing policy, you can send the information to me or Anja, who is
 cc'ed. We can update it for you.

 If you know your friends have dataset on the wiki, but are not on the
 mailing list, can you please kindly forward this email to them? We would
 like to get the data cloud as up-to-date as possible.

 For this release, we will use the above wiki page as the information
 gathering point. We do apologize if you have published information about
 your dataset on other web pages and this request would mean extra work for
 you.

 Many thanks for your contributions!

 Kindest regards,

 Jun


 __
 This email has been scanned by the MessageLabs Email Security System.
 For more information please visit http://www.messagelabs.com/email
 __




-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: ANN: D2R Server and D2RQ V0.7 released

2009-08-11 Thread Kingsley Idehen

Christian Becker wrote:

Hi all,

we are happy to announce the release of D2R Server and D2RQ Version 
0.7 and recommend all users to replace old installations with the new 
release.


Version 0.7 provides:

- Several bugfixes
- Support for Microsoft SQL Server
- Support for dynamic properties (by Jörg Henß)
- Support for limits on property bridge level (by Matthias Quasthoff)
- Better dump performance
- New optimizations that must be enabled using D2R Server's --fast 
switch or using d2rq:useAllOptimizations


More information about the tools is found on the

1. D2RQ Platform website: http://www4.wiwiss.fu-berlin.de/bizer/d2rq/
2. D2R Server website: http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/

The new releases can be downloaded from Sourceforge

http://sourceforge.net/projects/d2rq-map/

Lots of thanks for their magnificent work to:

- Andreas Langegger and Herwig Leimer for continued improvements of 
the D2RQ engine

- Richard Cyganiak for input on several design issues
- Jörg Henß for adding support for dynamic properties (by Jörg Henß)
- Matthias Quasthoff for adding limit support on property bridge level 
(d2rq:limit, d2rq:limitInverse, d2rq:orderAsc and d2rq:orderDesc)

- Alistair Miles for patch for SQL cursor support

Cheers,

Christian


Chris,

Why isn't D2R JDBC and/or ODBC based, in a generic sense? Both APIs 
provide enough Metadata oriented APIs for enabling a more RDBMS agnostic 
variant of D2R.


As I am sure you can imagine, my hair stands (literally) whenever I 
encounter RDBMS specific client apps. :-)


--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President  CEO 
OpenLink Software Web: http://www.openlinksw.com








Re: ANN: D2R Server and D2RQ V0.7 released

2009-08-11 Thread Richard Cyganiak

Kingsley,

On 11 Aug 2009, at 18:45, Kingsley Idehen wrote:
Why isn't D2R JDBC and/or ODBC based, in a generic sense? Both APIs  
provide enough Metadata oriented APIs for enabling a more RDBMS  
agnostic variant of D2R.


Glad to inform you that D2RQ is JDBC based, and has been since 2004.

Best,
Richard





As I am sure you can imagine, my hair stands (literally) whenever I  
encounter RDBMS specific client apps. :-)


--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President  CEO OpenLink Software Web: http://www.openlinksw.com










Re: [HELP] Can you please update information about your dataset?

2009-08-11 Thread Aldo Bucchi

Hi,

On Aug 11, 2009, at 13:46, Kingsley Idehen kide...@openlinksw.com  
wrote:



Leigh Dodds wrote:

Hi,

I've just added several new datasets to the Statistics page that
weren't previously listed. Its not really a great user experience
editing the wiki markup and manually adding up the figures.

So, thinking out loud, I'm wondering whether it might be more
appropriate to use a Google spreadsheet and one of their submission
forms for the purposes of collectively the data. A little manual
editing to remove duplicates might make managing this data a little
more easier. Especially as there are also pages that separately list
the available SPARQL endpoints and RDF dumps.

I'm sure we could create something much better using Void, etc but  
for

now, maybe using a slightly better tool would give us a little more
progress? It'd be a snip to dump out the Google Spreadsheet data
programmatically too, which'd be another improvement on the current
situation.

What does everyone else think?

Nice Idea! Especially as Google Spreadsheet to RDF is just about  
RDFizers for the Google Spreadsheet API :-)


Hehe. I have this in my todo (literally). A website that exposes a  
google spreadsheet as SPARQL endpoint. Internally we use it as UI to  
quickly create config files et Al.

But It will remain in my todo forever...;)

Kingsley, this could be sponged. The trick is that the spreadsheet  
must have an accompanying page/sheet/book with metadata (the NS or  
explicit URIs for cols).




Kingsley

Cheers,

L.

2009/8/7 Jun Zhao jun.z...@zoo.ox.ac.uk:


Dear all,

We are planning to produce an updated data cloud diagram based on  
the

dataset information on the esw wiki page:
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics

If you have not published your dataset there yet and you would  
like your

dataset to be included, can you please add your dataset there?

If you have an entry there for your dataset already, can you  
please update

information about your dataset on the wiki?

If you cannot edit the wiki page any more because the recent  
update of esw
wiki editing policy, you can send the information to me or Anja,  
who is

cc'ed. We can update it for you.

If you know your friends have dataset on the wiki, but are not on  
the
mailing list, can you please kindly forward this email to them? We  
would

like to get the data cloud as up-to-date as possible.

For this release, we will use the above wiki page as the information
gathering point. We do apologize if you have published information  
about
your dataset on other web pages and this request would mean extra  
work for

you.

Many thanks for your contributions!

Kindest regards,

Jun


__



This email has been scanned by the MessageLabs Email Security  
System.

For more information please visit http://www.messagelabs.com/email
__














--


Regards,

Kingsley Idehen  Weblog: http://www.openlinksw.com/blog/~kidehen
President  CEO OpenLink Software Web: http://www.openlinksw.com









Re: [HELP] Can you please update information about your dataset?

2009-08-11 Thread Jun Zhao

Hi Michael,

I have taken this dataset off the list.

I also believe that Yves has managed to update the record about BBC 
music thanks to Michael H.:)


cheers,

Jun

Michael Smethurst wrote:

Hi Jun/all

Just noticed the line on:

http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics

saying:

BBC Later + TOTP (link not responding - 2009-04-01)

That's my bad. The site's been down since we forgot to pay our ec2 bills :-/

Having said that the data has either moved or is in the process of 
moving to BBC programmes and BBC music so TOTP/Later should probably 
come off the cloud piccie and off the list and BBC Music should be added 
in linking to musicbrainz and bbc programmes. The TOTP/Later site was 
only ever intended as a try out and a demo to management types at the 
BBC. Strangely it seems to have worked... :-)


Also to note that BBC programmes and music do now have a joint sparql 
endpoint - well 2 in fact:

http://api.talis.com/stores/bbc-backstage
http://bbc.openlinksw.com/sparql

Sorry for not notifying earlier

Michael


-Original Message-
From: public-lod-requ...@w3.org on behalf of Jun Zhao
Sent: Fri 8/7/2009 7:05 PM
To: public-lod@w3.org
Cc: Anja Jentzsch
Subject: [HELP] Can you please update information about your dataset?

Dear all,

We are planning to produce an updated data cloud diagram based on the
dataset information on the esw wiki page:
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics

If you have not published your dataset there yet and you would like your
dataset to be included, can you please add your dataset there?

If you have an entry there for your dataset already, can you please
update information about your dataset on the wiki?

If you cannot edit the wiki page any more because the recent update of
esw wiki editing policy, you can send the information to me or Anja, who
is cc'ed. We can update it for you.

If you know your friends have dataset on the wiki, but are not on the
mailing list, can you please kindly forward this email to them? We would
like to get the data cloud as up-to-date as possible.

For this release, we will use the above wiki page as the information
gathering point. We do apologize if you have published information about
your dataset on other web pages and this request would mean extra work
for you.

Many thanks for your contributions!

Kindest regards,

Jun



http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain 
personal views which are not the views of the BBC unless specifically 
stated.

If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in 
reliance on it and notify the sender immediately.

Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.





Re: [HELP] Can you please update information about your dataset?

2009-08-11 Thread Hugh Glaser
Please no! Not another manual entry system.
I had already decided I just haven't got the time to manually maintain this 
constantly changing set of numbers, so would not be responding to the request 
to update.
(In fact, the number of different places that a good LD citizen has to put 
their data into the esw wiki is really rather high.)
Last time Anja was kind enough to put a lot of effort into processing the 
graphviz for us to generate the numbers, but this is not the way to do it.
In our case, we have 39 different stores, with linkages between them and to 
others outside.
There are therefore 504 numbers to represent the linkage, although they don't 
all meet a threshold.
For details of the linkage in rkbexplorer see pictures at 
http://www.rkbexplorer.com/linkage/ or query http://void.rkbexplorer.com/ .
And these figures are constantly changing, as the system identifies more - 
there can be more than 1000 a day.

If any more work is to be put into generating this picture, it really should be 
from voiD descriptions, which we already make available for all our datasets.
And for those who want to do it by hand, a simple system to allow them to 
specify the linkage using voiD would get the entry into a format for the voiD 
processor to use (I'm happy to host the data if need be).
Or Aldo's system could generate its RDF using the voiD ontology, thus providing 
the manual entry system?

I know we have been here before, and almost got to the voiD processor thing:- 
please can we try again?

Best
Hugh

On 11/08/2009 19:00, Aldo Bucchi aldo.buc...@gmail.com wrote:

Hi,

On Aug 11, 2009, at 13:46, Kingsley Idehen kide...@openlinksw.com
wrote:

 Leigh Dodds wrote:
 Hi,

 I've just added several new datasets to the Statistics page that
 weren't previously listed. Its not really a great user experience
 editing the wiki markup and manually adding up the figures.

 So, thinking out loud, I'm wondering whether it might be more
 appropriate to use a Google spreadsheet and one of their submission
 forms for the purposes of collectively the data. A little manual
 editing to remove duplicates might make managing this data a little
 more easier. Especially as there are also pages that separately list
 the available SPARQL endpoints and RDF dumps.

 I'm sure we could create something much better using Void, etc but
 for
 now, maybe using a slightly better tool would give us a little more
 progress? It'd be a snip to dump out the Google Spreadsheet data
 programmatically too, which'd be another improvement on the current
 situation.

 What does everyone else think?

 Nice Idea! Especially as Google Spreadsheet to RDF is just about
 RDFizers for the Google Spreadsheet API :-)

Hehe. I have this in my todo (literally). A website that exposes a
google spreadsheet as SPARQL endpoint. Internally we use it as UI to
quickly create config files et Al.
But It will remain in my todo forever...;)

Kingsley, this could be sponged. The trick is that the spreadsheet
must have an accompanying page/sheet/book with metadata (the NS or
explicit URIs for cols).


 Kingsley
 Cheers,

 L.

 2009/8/7 Jun Zhao jun.z...@zoo.ox.ac.uk:

 Dear all,

 We are planning to produce an updated data cloud diagram based on
 the
 dataset information on the esw wiki page:
 http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics

 If you have not published your dataset there yet and you would
 like your
 dataset to be included, can you please add your dataset there?

 If you have an entry there for your dataset already, can you
 please update
 information about your dataset on the wiki?

 If you cannot edit the wiki page any more because the recent
 update of esw
 wiki editing policy, you can send the information to me or Anja,
 who is
 cc'ed. We can update it for you.

 If you know your friends have dataset on the wiki, but are not on
 the
 mailing list, can you please kindly forward this email to them? We
 would
 like to get the data cloud as up-to-date as possible.

 For this release, we will use the above wiki page as the information
 gathering point. We do apologize if you have published information
 about
 your dataset on other web pages and this request would mean extra
 work for
 you.

 Many thanks for your contributions!

 Kindest regards,

 Jun


 __


 This email has been scanned by the MessageLabs Email Security
 System.
 For more information please visit http://www.messagelabs.com/email
 __










 --


 Regards,

 Kingsley Idehen  Weblog: http://www.openlinksw.com/blog/~kidehen
 President  CEO OpenLink Software Web: http://www.openlinksw.com










Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval

2009-08-11 Thread Hugh Glaser
On 11/08/2009 15:47, Pat Hayes pha...@ihmc.us wrote:

 
 
 On Aug 11, 2009, at 5:45 AM, Chris Bizer wrote:
 
 
 Hi Kingsley, Pat and all,
 
snip/
 
 Everything on the Web is a claim by somebody. There are no facts,
 there is
 no truth, there are only opinions.
 
 Same is true of the Web and of life in general, but still there are
 laws about slander, etc.; and outrageous falsehoods are rebutted or
 corrected (eg look at how Wikipedia is managed); or else their source
 is widely treated as nonsensical, which I hardly think DBpedia wishes
 to be. And also, I think we do have some greater responsibility to
 give our poor dumb inference engines a helping hand, since they have
 no common sense to help them sort out the wheat from the chaff, unlike
 our enlightened human selves.
 
 
 Semantic Web applications must take this into account and therefore
 always
 assess data quality and trustworthiness before they do something
 with the
 data.
I think that this discussion really emphasises how bad it is to put this
co-ref data in the same store as the other data.
Finding data in dbpedia that is mistaken/wrong/debateable undermines the
whole project - the contract dbpedia offers is to reflect the wikipedia
content that it offers.
And it isn't really sensible/possible to distinguish the extra sameas from
the real sameas.
Eg http://dbpedia.org/resource/London and
http://dbpedia.org/resource/Leondeon

And on the other hand, freebase is now in danger of being undermined by this
as well.

As time goes by, the more I think this is going wrong.

Best
Hugh
truncate/




Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval

2009-08-11 Thread Hugh Glaser
Hi Kingsley.

On 12/08/2009 00:28, Kingsley Idehen kide...@openlinksw.com wrote:

 Hugh Glaser wrote:
 On 11/08/2009 15:47, Pat Hayes pha...@ihmc.us wrote:
 
  
 On Aug 11, 2009, at 5:45 AM, Chris Bizer wrote:
 

 Hi Kingsley, Pat and all,
 
  
 snip/
  
 Everything on the Web is a claim by somebody. There are no facts,
 there is
 no truth, there are only opinions.
  
 Same is true of the Web and of life in general, but still there are
 laws about slander, etc.; and outrageous falsehoods are rebutted or
 corrected (eg look at how Wikipedia is managed); or else their source
 is widely treated as nonsensical, which I hardly think DBpedia wishes
 to be. And also, I think we do have some greater responsibility to
 give our poor dumb inference engines a helping hand, since they have
 no common sense to help them sort out the wheat from the chaff, unlike
 our enlightened human selves.
 

 Semantic Web applications must take this into account and therefore
 always
 assess data quality and trustworthiness before they do something
 with the
 data.
  
 I think that this discussion really emphasises how bad it is to put this
 co-ref data in the same store as the other data.
  
 Yes, they should be in distinct Named Graphs.
I thought you would mention Named Graphs :-)
 
 This is the point I was making a while back (in relation to Alan's
 comments about the same thing).
Yes, but this is the point I was making a while back about Named Graphs as a
solution - when I resolve a URI (follow-my-nose) in the recommended fashion,
I see no Named Graphs - they are only exposed in SPARQL stores.
If I resolve http://dbpedia.org/resource/London to get
http://dbpedia.org/data/London.rdf I see a bunch of RDF - go on, try it. No
sight of Named Graphs.
Are you saying that the only way to access Linked Data is via SPARQL?
 Finding data in dbpedia that is mistaken/wrong/debateable undermines the
 whole project - the contract dbpedia offers is to reflect the wikipedia
 content that it offers.
  
 Er. its prime contract is a Name Corpus. In due course there will be
 lots of meshes from other domains Linked Data contributors e.g. BBC,
 Reuters, New York Times etc..
I really don't think so.
Its prime contract is that I can resolve a URI for a NIR and get back things
like Description, Location, etc..
If it gives me dodgy other stuff that I can't distinguish, I will have to
stop using it, which would be a disaster.
 
 The goal of DBpedia was to set the ball rolling and in that regard its
 over achieved (albeit from my very biased view point).
Oh yes! - but let's not let it get spoilt.
 
 
 Perfection is not an option on the Web or in the real world. We exist in
 a continuum that is inherently buggy, by design (otherwise it would be
 very boring).
When we engineer things we accept all that - but what we then do is engineer
systems so that they are robust to the imperfections.
 
 And it isn't really sensible/possible to distinguish the extra sameas from
 the real sameas.
 Eg http://dbpedia.org/resource/London and
 http://dbpedia.org/resource/Leondeon
Sorry, I was wrong about these two being sameAs - they are dbpprop:redirect,
although I don't think that it changes the story.
Actually, in fact dbpprop:redirect may be a sub-property of owl:sameAs for
all I know.
(I think the URIs for http://dbpedia.org/property/ and
http://dbpedia.org/ontology/ need fixing :-) )
I had inferred they were sameAs, since they sameAs yago or fbase stuff,
which then get sameAs elsewhere.

 
 And on the other hand, freebase is now in danger of being undermined by this
 as well.
 
 As time goes by, the more I think this is going wrong.
  
 
 I think the complete opposite.
 
 We just need the traditional media players to comprehend that: Data is
 like Wine and Code is like Fish. Once understood, they will realize that
 the Web has simply introduced a medium of value exchange inflection
 i.e., the HTTP URI as opposed to URL (which replicates paper). Note,
 every media company is high quality Linked Data Space curator in
 disguise, they just need to understand what the Web really offers :-)
By this, I meant putting the co-reffing (sameAs) links in the RDF that is
returned with the data about the NIR when a URI is resolved.
 
 
 Best
 Hugh
 truncate/
 
 
  
 
 
 --
 
 
 Regards,
Cheers
Hugh
 
 Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
 President  CEO
 OpenLink Software Web: http://www.openlinksw.com
 
 
 
 
 
 




Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval

2009-08-11 Thread Peter Ansell
2009/8/12 Hugh Glaser h...@ecs.soton.ac.uk:
 Are you saying that the only way to access Linked Data is via SPARQL?

That is going a bit far, but in the end if you want to allow people to
extend the model it has to be done using SPARQL. If the extension is
taken well by users then it could be included in what is resolved for
the URI but that doesn't mean it is not Linked Data up until the point
it is included.

I for one loved the recent addition of the Page Links set in a
separate Named Graph, and I don't see how this is different.

Cheers,

Peter



Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval

2009-08-11 Thread Hugh Glaser
Dear Peter,
Thank you for your comments, which I think raise the main issues.

On 12/08/2009 01:11, Peter Ansell ansell.pe...@gmail.com wrote:

 2009/8/12 Hugh Glaser h...@ecs.soton.ac.uk:
 Are you saying that the only way to access Linked Data is via SPARQL?
 
 That is going a bit far, but in the end if you want to allow people to
 extend the model it has to be done using SPARQL. If the extension is
 taken well by users then it could be included in what is resolved for
 the URI but that doesn't mean it is not Linked Data up until the point
 it is included.
My view is that if you need to extend (I would say step outside) the
model, then something is broken. Or at least it is broken until the model
includes the extension, as you suggest. So we need to work out how to
include such extensions in the model, if such a thing is desirable.

Did I go too far?
I'm not sure. I have a sense that the suggested solution to any problem I
raise is Oh don't worry, just use a Named Graph.
But How to Publish Linked Data on the Web
(http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/),
which really is an excellent description of what I think should be
happening, makes no real mention of the idea that a SPARQL endpoint might be
associated with Linked Data.
In fact, it says that if you have a SPARQL endpoint (for example using D2R),
you might use Pubby as a Linked Data interface in front of your SPARQL
endpoint.
And pubby says:
Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server.
I infer from this that SPARQL endpoints are optional extras when publishing
Linked Data. So any solutions to problems must work simply by resolving
URIs.
 
 I for one loved the recent addition of the Page Links set in a
 separate Named Graph, and I don't see how this is different.
That's great.
I'd be interested to know how you make use of them?
We find it very hard to make use of Named Graph data.
All we start with is a URI for a NIR; so all we can do is resolve it.
We cache the resulting RDF and then use it for analysis and fresnel
rendering.
It is pretty hard to build in anything that takes any notice of Named Graphs
at arbitrary Linked Data sites. We would need to be able to find the SPARQL
endpoint from a URI so that we can do the DESCRIBE, and then also be able to
specify a Named Graph to go with it. In fact, how would I do that from
http://dbpedia.org/resource/London ?

I'm afraid I find Linked Data (by resolving URIs) really beautiful, and
think I can understand how I and others might use it. So when it is
suggested that the way to solve an issue with how it works is to step
outside the RDFramework, I think it needs to be challenged or brought into
the Framework.
 
 Cheers,
 
 Peter
 
Cheers Peter.
Hope that helps to show where I come from.
Best
Hugh




Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval

2009-08-11 Thread Peter Ansell
2009/8/12 Hugh Glaser h...@ecs.soton.ac.uk:
 Dear Peter,
 Thank you for your comments, which I think raise the main issues.

 On 12/08/2009 01:11, Peter Ansell ansell.pe...@gmail.com wrote:

 2009/8/12 Hugh Glaser h...@ecs.soton.ac.uk:
 Are you saying that the only way to access Linked Data is via SPARQL?

 That is going a bit far, but in the end if you want to allow people to
 extend the model it has to be done using SPARQL. If the extension is
 taken well by users then it could be included in what is resolved for
 the URI but that doesn't mean it is not Linked Data up until the point
 it is included.
 My view is that if you need to extend (I would say step outside) the
 model, then something is broken. Or at least it is broken until the model
 includes the extension, as you suggest. So we need to work out how to
 include such extensions in the model, if such a thing is desirable.

By extend I meant extend the information pool, and not necessarily
extend the protocol, which should still work with some suggestions I
make below.

I definitely think extensions are useful, although they may need to
appear with different URI's to the accepted set of information pieces
that have been published and recognised as the minimal set by the
original author.

 Did I go too far?
 I'm not sure. I have a sense that the suggested solution to any problem I
 raise is Oh don't worry, just use a Named Graph.
 But How to Publish Linked Data on the Web
 (http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/),
 which really is an excellent description of what I think should be
 happening, makes no real mention of the idea that a SPARQL endpoint might be
 associated with Linked Data.
 In fact, it says that if you have a SPARQL endpoint (for example using D2R),
 you might use Pubby as a Linked Data interface in front of your SPARQL
 endpoint.
 And pubby says:
 Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server.
 I infer from this that SPARQL endpoints are optional extras when publishing
 Linked Data. So any solutions to problems must work simply by resolving
 URIs.

I have a very similar approach to this with the Bio2RDF server, but I
am using multiple SPARQL endpoints to provide resolution for URI's.

I use the ability to get information by either URI resolution or
SPARQL endpoints to create extended versions. SPARQL endpoints should
be optional, but encouraged IMO, so people can pick and choose without
having to transfer everything across the wire every time they access
the information if they want to optimise their applications.


 I for one loved the recent addition of the Page Links set in a
 separate Named Graph, and I don't see how this is different.
 That's great.
 I'd be interested to know how you make use of them?
 We find it very hard to make use of Named Graph data.
 All we start with is a URI for a NIR; so all we can do is resolve it.
 We cache the resulting RDF and then use it for analysis and fresnel
 rendering.
 It is pretty hard to build in anything that takes any notice of Named Graphs
 at arbitrary Linked Data sites. We would need to be able to find the SPARQL
 endpoint from a URI so that we can do the DESCRIBE, and then also be able to
 specify a Named Graph to go with it. In fact, how would I do that from
 http://dbpedia.org/resource/London ?

In short it is difficult, but not impossible if you are aware that
there is some extra information that you want to include for your
users that doesn't come from the URI resolution.

I have been working on a system that can take notice of Named Graphs,
but it doesn't work with arbitrary URI's as it requires the URI's to
be normalised to some scheme that the software recognises. For
instance, the normalised form of http://dbpedia.org/resource/London in
my system is http://domain.name/dbpedia:London;, with the domain.name
being specified by the user. By design it doesn't fit with the notion
that URI's are opaque and shouldn't be modified, but it is hard to
deny that it works. Resolving http://qut.bio2rdf.org/dbpedia:London
for instance will include the PageLinks set along with any extensions
that Matthias Samwald has included to link OBO to DBpedia (although in
this case it is unlikely any would exist in this set) and some links
that the DrugBank LODD project provide using their dataset in relation
to DBpedia resources. If you want to know exactly which datasets would
be resolved there is a URI for that...
http://qut.bio2rdf.org/queryplan/dbpedia:London

In some ways it isn't really typical Linked Data, but it allows the
distributed extensions that I think people really want access to in
some cases.

 I'm afraid I find Linked Data (by resolving URIs) really beautiful, and
 think I can understand how I and others might use it. So when it is
 suggested that the way to solve an issue with how it works is to step
 outside the RDFramework, I think it needs to be challenged or brought into
 the Framework.

One way you could do it could be by 

Re: AW: [Dbpedia-discussion] Fwd: Your message to Dbpedia-discussion awaits moderator approval

2009-08-11 Thread Kingsley Idehen

Hugh Glaser wrote:

Hi Kingsley.

On 12/08/2009 00:28, Kingsley Idehen kide...@openlinksw.com wrote:

  

Hugh Glaser wrote:


On 11/08/2009 15:47, Pat Hayes pha...@ihmc.us wrote:

 
  

On Aug 11, 2009, at 5:45 AM, Chris Bizer wrote:

   


Hi Kingsley, Pat and all,

 
  

snip/
 
  

Everything on the Web is a claim by somebody. There are no facts,
there is
no truth, there are only opinions.
 
  

Same is true of the Web and of life in general, but still there are
laws about slander, etc.; and outrageous falsehoods are rebutted or
corrected (eg look at how Wikipedia is managed); or else their source
is widely treated as nonsensical, which I hardly think DBpedia wishes
to be. And also, I think we do have some greater responsibility to
give our poor dumb inference engines a helping hand, since they have
no common sense to help them sort out the wheat from the chaff, unlike
our enlightened human selves.

   


Semantic Web applications must take this into account and therefore
always
assess data quality and trustworthiness before they do something
with the
data.
 
  

I think that this discussion really emphasises how bad it is to put this
co-ref data in the same store as the other data.
 
  

Yes, they should be in distinct Named Graphs.


I thought you would mention Named Graphs :-)
  

This is the point I was making a while back (in relation to Alan's
comments about the same thing).


Yes, but this is the point I was making a while back about Named Graphs as a
solution - when I resolve a URI (follow-my-nose) in the recommended fashion,
I see no Named Graphs - they are only exposed in SPARQL stores.
If I resolve http://dbpedia.org/resource/London to get
http://dbpedia.org/data/London.rdf I see a bunch of RDF - go on, try it. No
sight of Named Graphs.
  
Correct, but the publisher of the Linked Data is putting HTTP URIs in 
front of the content of a Quad Store. These URIs are associated with 
SPARQL queries (in the case of DBpedia).With regards to the great 
example from yesterday, I deliberately put out two different views to 
demonstrate that you can partition data and not break the graph 
traversal desired by the follow-your-nose data exploration and discovery 
pattern. But note, and this is very important, the follow-your-nose 
pattern doesn't eradicate the fact that  cul-de-sacs and T-junctions 
will also be part the Web of Linked Data.



Are you saying that the only way to access Linked Data is via SPARQL?
  

Finding data in dbpedia that is mistaken/wrong/debateable undermines the
whole project - the contract dbpedia offers is to reflect the wikipedia
content that it offers.
 
  

Er. its prime contract is a Name Corpus. In due course there will be
lots of meshes from other domains Linked Data contributors e.g. BBC,
Reuters, New York Times etc..


I really don't think so.
  

In my world view contract doesn't imply sole use or potential :-)

Its prime contract is that I can resolve a URI for a NIR and get back things
like Description, Location, etc..
  

I've written enough about HTTP URIs and their virtues [1].

Hopefully, we will forget the horrible term: NIR, really.  It just about 
data items, their identifiers, and associated metadata.



If it gives me dodgy other stuff that I can't distinguish, I will have to
stop using it, which would be a disaster.
  

The goal of DBpedia was to set the ball rolling and in that regard its
over achieved (albeit from my very biased view point).


Oh yes! - but let's not let it get spoilt.
  
I really believe you are overreacting here. Ironically, you seem to have 
missed the trivial manner in which this data set was erased without any 
effect on DBpedia URIs whatsoever. Even at the time the data was loaded, 
you wouldn't have been able to de-reference this data from DBpedia URIs 
(back to the Named Graph issue above and follow-your-nose) since the 
SPARQL that generates the metadata for DBpedia's HTTP URIs is explicitly 
scoped to Graph IRI: http://dbpedia.org .


Remember, this linkset was basically a set of axioms that could have 
been used solely for backward chained reasoning via SPARQL pragmas. Said 
SPARQL could even be used as basis for  a different set of HTTP URIs 
that point to the DBpedia ones (without explicit inverse triples in the 
DBpedia graph and the link property doesn't have to one that's symmetrical).

Perfection is not an option on the Web or in the real world. We exist in
a continuum that is inherently buggy, by design (otherwise it would be
very boring).


When we engineer things we accept all that - but what we then do is engineer
systems so that they are robust to the imperfections.
  
Sure re. robustness, but ironically you don't quite see the robustness 
and dexterity this whole episode has unveiled re. community discourse 
and rapid resolution etc..
We would have had a little problem if the data had been loaded into the 
DBpedia Named