Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-08 Thread Rinke Hoekstra
Hi,

FWIW I'm having the same problems logging in on CKAN at the moment. Google 
doesn't work, and Yahoo gives a 404 (hence the cc to ckan-discuss).

-Rinke


On 8 sep 2010, at 07:31, Peter DeVries wrote:

 I am kind of annoyed by the CKAN site.
 
 1) A lot of this information they are requesting are already provided by my 
 void file, sitemap.xml and SPARQL Endpoint
 2) My google openID does not seem to work on this site and their are no 
 non-openid account setup alternative.
 3) The TaxonConcept data set is much more interlinked that my other data set 
 GeoSpecies.
 
 Respectively,
 
 - Pete
 
 On Thu, Sep 2, 2010 at 1:10 PM, Anja Jentzsch a...@anjeve.de wrote:
 Hi all,
 
 we are in the process of drawing the next version of the LOD cloud diagram. 
 This time it is likely to contain around 180 datasets altogether having a 
 size of around 20 billion RDF triples.
 
 For drawing the next version of the LOD cloud, we have started to collect 
 meta-information about the datasets to be included on CKAN, a registry of 
 open data and content packages provided by the Open Knowledge Foundation.
 
 The list of datasets about which we have already collected information is be 
 found here:
 
 http://www4.wiwiss.fu-berlin.de/lodcloud/
 
 In addition to basic meta-information about a dataset such as its size and 
 the number of links pointing at other datasets, we also collect additional 
 meta-information about the license of the dataset, alternative access options 
 like SPARQL endpoints or dataset dumps, and whether there exist a voiD 
 description of the dataset or a Semantic Web Sitemap.
 
 So if your dataset is not listed yet and you want to have it included into 
 the next version of the LOD cloud, please add it to CKAN until next Wednesday 
 (September 8th, 2010).
 
 Also, if we have collected wrong information about your dataset or if your 
 dataset is only partially described up till now, it would be great if you 
 could add the missing information.
 
 Guidelines about how to add datasets to CKAN as well as about the tags that 
 we are using to annotate the datasets are found here:
 http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation
 
 We thank all contributors in advance for their input and help, which 
 hopefully will allow us to draw the next version of the LOD cloud as accurate 
 as possible.
 
 Cheers,
 
 Anja Jentzsch, Richard Cyganiak, Chris Bizer
 
 
 
 
 -- 
 
 Pete DeVries
 Department of Entomology
 University of Wisconsin - Madison
 445 Russell Laboratories
 1630 Linden Drive
 Madison, WI 53706
 TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
 About the GeoSpecies Knowledge Base
 


---
Dr Rinke Hoekstra

AI Department |   Leibniz Center for Law
Faculty of Sciences   |   Faculty of Law
Vrije Universiteit|   Universiteit van Amsterdam
De Boelelaan 1081a|   Kloveniersburgwal 48  
1081 HV Amsterdam |   1012 CX  Amsterdam
+31-(0)20-5987752 |   +31-(0)20-5253497 
hoeks...@few.vu.nl|   hoeks...@uva.nl   

Homepage: http://www.few.vu.nl/~hoekstra







Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-08 Thread Anja Jentzsch
Hi Pete,

 I am kind of annoyed by the CKAN site.
 
 1) A lot of this information they are requesting are already provided by my 
 void file, sitemap.xml and SPARQL Endpoint

we are aiming at CKAN being the data registry for the LOD datasets and for 
people to easily get the information needed on datasets for using them. So it 
is essential to provide information like links between datasets and links to 
voiD file, SPARQL endpoint, license etc..
As far as the voiD file and sitemap can be found on the project websites we 
already pulled the data from them and put it into CKAN. For TaxonConcept this 
isn't the case. So please add the information to CKAN or at least the links to 
the voiD file and sitemap.xml.

 2) My google openID does not seem to work on this site and their are no 
 non-openid account setup alternative.

I forwarded this to the CKAN team. But you can still edit without logging in.

 3) The TaxonConcept data set is much more interlinked that my other data set 
 GeoSpecies.

So please provide us with either the stats or the links to voiD and sitemap.

Cheers,
Anja

 Respectively,
 
 - Pete
 
 On Thu, Sep 2, 2010 at 1:10 PM, Anja Jentzsch a...@anjeve.de wrote:
 Hi all,
 
 we are in the process of drawing the next version of the LOD cloud diagram. 
 This time it is likely to contain around 180 datasets altogether having a 
 size of around 20 billion RDF triples.
 
 For drawing the next version of the LOD cloud, we have started to collect 
 meta-information about the datasets to be included on CKAN, a registry of 
 open data and content packages provided by the Open Knowledge Foundation.
 
 The list of datasets about which we have already collected information is be 
 found here:
 
 http://www4.wiwiss.fu-berlin.de/lodcloud/
 
 In addition to basic meta-information about a dataset such as its size and 
 the number of links pointing at other datasets, we also collect additional 
 meta-information about the license of the dataset, alternative access options 
 like SPARQL endpoints or dataset dumps, and whether there exist a voiD 
 description of the dataset or a Semantic Web Sitemap.
 
 So if your dataset is not listed yet and you want to have it included into 
 the next version of the LOD cloud, please add it to CKAN until next Wednesday 
 (September 8th, 2010).
 
 Also, if we have collected wrong information about your dataset or if your 
 dataset is only partially described up till now, it would be great if you 
 could add the missing information.
 
 Guidelines about how to add datasets to CKAN as well as about the tags that 
 we are using to annotate the datasets are found here:
 http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation
 
 We thank all contributors in advance for their input and help, which 
 hopefully will allow us to draw the next version of the LOD cloud as accurate 
 as possible.
 
 Cheers,
 
 Anja Jentzsch, Richard Cyganiak, Chris Bizer
 
 
 
 
 -- 
 
 Pete DeVries
 Department of Entomology
 University of Wisconsin - Madison
 445 Russell Laboratories
 1630 Linden Drive
 Madison, WI 53706
 TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
 About the GeoSpecies Knowledge Base
 




Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-08 Thread Ted Thibodeau Jr

On Sep 8, 2010, at 01:31 AM, Peter DeVries wrote:

 I am kind of annoyed by the CKAN site.

I'm right there with you, Peter.

Anja, you say you can edit without logging in but please note that
the doc page [1] about this database says --

   • Please register to CKAN bevor editing or adding any packages.

When I ignore that and do dive into editing DBpedia's listing, 
I discover --

- The notes field uses Markdown markup, which I've never 
  encountered anywhere else, and must now learn (or fake).

- There must be a singular author, with a singular email address.
  DBpedia doesn't have a singular author, and there are several
  URIs which might be relevant to have here -- and they are not
  mailto: URIs.  The best is an http: URI ... but there is no way
  to make this present, except as part of the literal associated
  with the mailto: URI.

- There must be a singular maintainer, with a singular email address.
  Same issues as with author.

- There are 14+ CKAN Resource links listed [2] in the documentation, 
  but the form appears to only take 5 (at least, 4 were previously
  filled on the DBpedia page, and filling in the 5th didn't magically
  cause a 6th to open, nor was there a link to create a 6th).  OH!
  Until I Preview the page -- and now there's an empty set of 
  Resource boxes ... so I can add one more, and Preview, and maybe
  add one more, and Preview, and maybe  Painful.

- The licensure choices separate CC-ShareAlike and CC-Attribution, but
  do not list CC-Attribution-ShareAlike [3].  cc-by-sa is distinct from
  cc-by -- and also from cc-by-nc-sa (CC-Attribution-NonCommercial-
  ShareAlike), among others.  Clarity of presentation is VERY important 
  for licensing!

- There appears to be an arbitrary limit on the number of Extras 
  key-value pairs associated with any given data set ... which means
  that *truly* densely connected data sets will be short-changed.

From all I can see here, this is an RDB-based thing, not RDF-based.  That's 
disappointing, to say the least.

All in all, the experience is challenging at best, when listing one 
data set.  But I have several more to deal with, and today's the 
deadline!  Hurrah!

*sighs*

Ted



[1] 
http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation#How_do_I_add_a_dataset_to_CKAN_or_edit_an_existing_dataset.3F
[2] 
http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation#CKAN_resource_links
[3] 
http://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License




--
A: Yes.  http://www.guckes.net/faq/attribution.html
| Q: Are you sure?
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?

Ted Thibodeau, Jr.   //   voice +1-781-273-0900 x32
Evangelism  Support //mailto:tthibod...@openlinksw.com
 //  http://twitter.com/TallTed
OpenLink Software, Inc.  //  http://www.openlinksw.com/
10 Burlington Mall Road, Suite 265, Burlington MA 01803
 http://www.openlinksw.com/weblogs/uda/
OpenLink Blogs  http://www.openlinksw.com/weblogs/virtuoso/
   http://www.openlinksw.com/blog/~kidehen/
Universal Data Access and Virtual Database Technology Providers







Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-08 Thread Leigh Dodds
Hi,

I've updated several packagea and knot had any issues. While CKAN may
not be to everyone's taste, it's a much, much better than the previous
approaches which were largely opaque

The fact that we will now have more structured data describing the
cloud so that it can be analysed is another big win. Converting the
data to RDF is easy. The CKAN API is simple and easy to use.

Maybe the grumbling can be converted into useful contributions to the
CKAN code base, which is open source and being used by a number of
different organisations. No one ga bothered to create anything better
in the past, so using what's available and looking for ways to improve
it seems like a more constructive approach IMHO.


Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-08 Thread William Waites
On 10-09-08 17:47, Ted Thibodeau Jr wrote:
 
 On Sep 8, 2010, at 01:31 AM, Peter DeVries wrote:
 
 I am kind of annoyed by the CKAN site.
 
 I'm right there with you, Peter.
 
 Anja, you say you can edit without logging in but please note that
 the doc page [1] about this database says --
 
• Please register to CKAN bevor editing or adding any packages.

The login issues should be fixed now. Something had changed
at Google and Yahoo that was causing them to return 501
Unimplemented errors when the association was made. Updating
python-openid to a newer version (2.2.5) appears to have
solved the problem. Please let me know if anyone has further
troubles logging in.

 
 When I ignore that and do dive into editing DBpedia's listing, 
 I discover --

(Leaving your comments intact for the ckan-discuss list, you
are correct that it is an RDBMS system and that starts
showing through clearly when people used to thinking in an
RDF or EAV way start throwing data at it. I am particularly
interested in looking at ways to improve this, keeping in
mind that it is a running system with real users and a lot
of effort that has gone into building it -- so we need to be
gentle).

 - The notes field uses Markdown markup, which I've never 
   encountered anywhere else, and must now learn (or fake).
 
 - There must be a singular author, with a singular email address.
   DBpedia doesn't have a singular author, and there are several
   URIs which might be relevant to have here -- and they are not
   mailto: URIs.  The best is an http: URI ... but there is no way
   to make this present, except as part of the literal associated
   with the mailto: URI.
 
 - There must be a singular maintainer, with a singular email address.
   Same issues as with author.
 
 - There are 14+ CKAN Resource links listed [2] in the documentation, 
   but the form appears to only take 5 (at least, 4 were previously
   filled on the DBpedia page, and filling in the 5th didn't magically
   cause a 6th to open, nor was there a link to create a 6th).  OH!
   Until I Preview the page -- and now there's an empty set of 
   Resource boxes ... so I can add one more, and Preview, and maybe
   add one more, and Preview, and maybe  Painful.
 
 - The licensure choices separate CC-ShareAlike and CC-Attribution, but
   do not list CC-Attribution-ShareAlike [3].  cc-by-sa is distinct from
   cc-by -- and also from cc-by-nc-sa (CC-Attribution-NonCommercial-
   ShareAlike), among others.  Clarity of presentation is VERY important 
   for licensing!
 
 - There appears to be an arbitrary limit on the number of Extras 
   key-value pairs associated with any given data set ... which means
   that *truly* densely connected data sets will be short-changed.
 
 From all I can see here, this is an RDB-based thing, not RDF-based.  That's 
 disappointing, to say the least.
 
 All in all, the experience is challenging at best, when listing one 
 data set.  But I have several more to deal with, and today's the 
 deadline!  Hurrah!
 
 *sighs*
 
 Ted
 
 
 
 [1] 
 http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation#How_do_I_add_a_dataset_to_CKAN_or_edit_an_existing_dataset.3F
 [2] 
 http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation#CKAN_resource_links
 [3] 
 http://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License
 
 
 
 
 --
 A: Yes.  http://www.guckes.net/faq/attribution.html
 | Q: Are you sure?
 | | A: Because it reverses the logical flow of conversation.
 | | | Q: Why is top posting frowned upon?
 
 Ted Thibodeau, Jr.   //   voice +1-781-273-0900 x32
 Evangelism  Support //mailto:tthibod...@openlinksw.com
  //  http://twitter.com/TallTed
 OpenLink Software, Inc.  //  http://www.openlinksw.com/
 10 Burlington Mall Road, Suite 265, Burlington MA 01803
  http://www.openlinksw.com/weblogs/uda/
 OpenLink Blogs  http://www.openlinksw.com/weblogs/virtuoso/
http://www.openlinksw.com/blog/~kidehen/
 Universal Data Access and Virtual Database Technology Providers
 
 
 
 
 


-- 
William Waites   william.wai...@okfn.org
Mob: +44 789 798 9965Open Knowledge Foundation
Fax: +44 131 464 4948Edinburgh, UK

RDF Indexing, Clustering and Inferencing in Python
http://ordf.org/



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-08 Thread William Waites
On 10-09-08 18:36, Leigh Dodds wrote:
 Hi,
 
 I've updated several packagea and knot had any issues. While CKAN may
 not be to everyone's taste, it's a much, much better than the previous
 approaches which were largely opaque

Thank you for this Leigh.

 The fact that we will now have more structured data describing the
 cloud so that it can be analysed is another big win. Converting the
 data to RDF is easy. The CKAN API is simple and easy to use.

In fact, for python hackers, there is http://bitbucket.org/ww/ckanrdf
which is another kettle of fish, but it will crawl the CKAN API and
put a DCAT representation into an rdflib store (the precise way this
is handled -- see http://ordf.org/ is another topic that I would be
very happy to discuss). Anyone is, of course, perfectly welcome to
roll their own as was done for the current LOD work.

 Maybe the grumbling can be converted into useful contributions to the
 CKAN code base, which is open source and being used by a number of
 different organisations. No one ga bothered to create anything better
 in the past, so using what's available and looking for ways to improve
 it seems like a more constructive approach IMHO.

I would suggest that unless there is particular interest on the
public-lod list about the workings of CKAN and how it could be
improved that we could continue discussion on the
ckan-disc...@lists.okfn.org list. Contributions of code, ideas and
(constructive) criticism alike are more than welcome.

Cheers,
-w

-- 
William Waites   william.wai...@okfn.org
Mob: +44 789 798 9965Open Knowledge Foundation
Fax: +44 131 464 4948Edinburgh, UK

RDF Indexing, Clustering and Inferencing in Python
http://ordf.org/



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-08 Thread Juan Sequeda
Apologies for this dumb question, but how do I register a dataset to the
lodcloud group?


Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Thu, Sep 2, 2010 at 1:10 PM, Anja Jentzsch a...@anjeve.de wrote:

 Hi all,

 we are in the process of drawing the next version of the LOD cloud diagram.
 This time it is likely to contain around 180 datasets altogether having a
 size of around 20 billion RDF triples.

 For drawing the next version of the LOD cloud, we have started to collect
 meta-information about the datasets to be included on CKAN, a registry of
 open data and content packages provided by the Open Knowledge Foundation.

 The list of datasets about which we have already collected information is
 be found here:

 http://www4.wiwiss.fu-berlin.de/lodcloud/

 In addition to basic meta-information about a dataset such as its size and
 the number of links pointing at other datasets, we also collect additional
 meta-information about the license of the dataset, alternative access
 options like SPARQL endpoints or dataset dumps, and whether there exist a
 voiD description of the dataset or a Semantic Web Sitemap.

 So if your dataset is not listed yet and you want to have it included into
 the next version of the LOD cloud, please add it to CKAN until next
 Wednesday (September 8th, 2010).

 Also, if we have collected wrong information about your dataset or if your
 dataset is only partially described up till now, it would be great if you
 could add the missing information.

 Guidelines about how to add datasets to CKAN as well as about the tags that
 we are using to annotate the datasets are found here:

 http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

 We thank all contributors in advance for their input and help, which
 hopefully will allow us to draw the next version of the LOD cloud as
 accurate as possible.

 Cheers,

 Anja Jentzsch, Richard Cyganiak, Chris Bizer




Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-08 Thread Peter DeVries
I was able to create a new OpenID and use that to update the documentation
for the following entries:

http://www.ckan.net/package/taxonconcept

http://www.ckan.net/package/geospecies

I have a number of links to GeoNames in GeoSpecies, but have not been able
to get a count yet.

This RDF will give you some idea of the GeoSpecies interlinking.

http://lod.taxonconcept.org/ses/mCcSp.rdf

There is some overlap between the GeoSpecies and TaxonConcept data sets that
will be cleaned up in the future.

There was a reason for this which will make sense in the future :-)

For now, the GeoSpecies species are a subset of all the species in
TaxonConcept, but with a slightly different conceptualization.

Both are available at the sparql end point
http://lsd.taxonconcept.org/sparql

along with the EUNIS and other related data sets. (I am currently updating
the endpoint's data from in Bio2RDF and the GNI.)

There is more information available here:
http://www.taxonconcept.org/sparql-endpoint/

Respectfully,

- Pete




On Thu, Sep 2, 2010 at 1:10 PM, Anja Jentzsch a...@anjeve.de wrote:

 Hi all,

 we are in the process of drawing the next version of the LOD cloud diagram.
 This time it is likely to contain around 180 datasets altogether having a
 size of around 20 billion RDF triples.

 For drawing the next version of the LOD cloud, we have started to collect
 meta-information about the datasets to be included on CKAN, a registry of
 open data and content packages provided by the Open Knowledge Foundation.

 The list of datasets about which we have already collected information is
 be found here:

 http://www4.wiwiss.fu-berlin.de/lodcloud/

 In addition to basic meta-information about a dataset such as its size and
 the number of links pointing at other datasets, we also collect additional
 meta-information about the license of the dataset, alternative access
 options like SPARQL endpoints or dataset dumps, and whether there exist a
 voiD description of the dataset or a Semantic Web Sitemap.

 So if your dataset is not listed yet and you want to have it included into
 the next version of the LOD cloud, please add it to CKAN until next
 Wednesday (September 8th, 2010).

 Also, if we have collected wrong information about your dataset or if your
 dataset is only partially described up till now, it would be great if you
 could add the missing information.

 Guidelines about how to add datasets to CKAN as well as about the tags that
 we are using to annotate the datasets are found here:

 http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

 We thank all contributors in advance for their input and help, which
 hopefully will allow us to draw the next version of the LOD cloud as
 accurate as possible.

 Cheers,

 Anja Jentzsch, Richard Cyganiak, Chris Bizer




-- 

Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies
Knowledge Base http://lod.geospecies.org/
About the GeoSpecies Knowledge Base http://about.geospecies.org/



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-08 Thread Juan Sequeda
Hi Pete

This is really cool. I just added our dataset: Fishes of Texas, which we
link to TaxonConcept :)

http://ckan.net/package/fishes-of-texas


However, I don't see an option where I can add this dataset to the lodcloud
group. However, I have all the tags. I hope that is enough.

We are currently in the process of shifting servers so the the uris aren't
working right now, but it should all be up and running by the end of the
week.

Cheers

Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Wed, Sep 8, 2010 at 8:42 PM, Peter DeVries pete.devr...@gmail.comwrote:

 I was able to create a new OpenID and use that to update the documentation
 for the following entries:

 http://www.ckan.net/package/taxonconcept

 http://www.ckan.net/package/geospecies

 I have a number of links to GeoNames in GeoSpecies, but have not been able
 to get a count yet.

 This RDF will give you some idea of the GeoSpecies interlinking.

 http://lod.taxonconcept.org/ses/mCcSp.rdf

 There is some overlap between the GeoSpecies and TaxonConcept data sets
 that will be cleaned up in the future.

 There was a reason for this which will make sense in the future :-)

 For now, the GeoSpecies species are a subset of all the species in
 TaxonConcept, but with a slightly different conceptualization.

 Both are available at the sparql end point
 http://lsd.taxonconcept.org/sparql

 along with the EUNIS and other related data sets. (I am currently updating
 the endpoint's data from in Bio2RDF and the GNI.)

 There is more information available here:
 http://www.taxonconcept.org/sparql-endpoint/

 Respectfully,

 - Pete




 On Thu, Sep 2, 2010 at 1:10 PM, Anja Jentzsch a...@anjeve.de wrote:

 Hi all,

 we are in the process of drawing the next version of the LOD cloud
 diagram. This time it is likely to contain around 180 datasets altogether
 having a size of around 20 billion RDF triples.

 For drawing the next version of the LOD cloud, we have started to collect
 meta-information about the datasets to be included on CKAN, a registry of
 open data and content packages provided by the Open Knowledge Foundation.

 The list of datasets about which we have already collected information is
 be found here:

 http://www4.wiwiss.fu-berlin.de/lodcloud/

 In addition to basic meta-information about a dataset such as its size and
 the number of links pointing at other datasets, we also collect additional
 meta-information about the license of the dataset, alternative access
 options like SPARQL endpoints or dataset dumps, and whether there exist a
 voiD description of the dataset or a Semantic Web Sitemap.

 So if your dataset is not listed yet and you want to have it included into
 the next version of the LOD cloud, please add it to CKAN until next
 Wednesday (September 8th, 2010).

 Also, if we have collected wrong information about your dataset or if your
 dataset is only partially described up till now, it would be great if you
 could add the missing information.

 Guidelines about how to add datasets to CKAN as well as about the tags
 that we are using to annotate the datasets are found here:

 http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

 We thank all contributors in advance for their input and help, which
 hopefully will allow us to draw the next version of the LOD cloud as
 accurate as possible.

 Cheers,

 Anja Jentzsch, Richard Cyganiak, Chris Bizer




 --
 
 Pete DeVries
 Department of Entomology
 University of Wisconsin - Madison
 445 Russell Laboratories
 1630 Linden Drive
 Madison, WI 53706
 TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies
 Knowledge Base http://lod.geospecies.org/
 About the GeoSpecies Knowledge Base http://about.geospecies.org/
 



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-07 Thread Peter DeVries
I am kind of annoyed by the CKAN site.

1) A lot of this information they are requesting are already provided by my
void file, sitemap.xml and SPARQL Endpoint
2) My google openID does not seem to work on this site and their are no
non-openid account setup alternative.
3) The TaxonConcept data set is much more interlinked that my other data set
GeoSpecies.

Respectively,

- Pete

On Thu, Sep 2, 2010 at 1:10 PM, Anja Jentzsch a...@anjeve.de wrote:

 Hi all,

 we are in the process of drawing the next version of the LOD cloud diagram.
 This time it is likely to contain around 180 datasets altogether having a
 size of around 20 billion RDF triples.

 For drawing the next version of the LOD cloud, we have started to collect
 meta-information about the datasets to be included on CKAN, a registry of
 open data and content packages provided by the Open Knowledge Foundation.

 The list of datasets about which we have already collected information is
 be found here:

 http://www4.wiwiss.fu-berlin.de/lodcloud/

 In addition to basic meta-information about a dataset such as its size and
 the number of links pointing at other datasets, we also collect additional
 meta-information about the license of the dataset, alternative access
 options like SPARQL endpoints or dataset dumps, and whether there exist a
 voiD description of the dataset or a Semantic Web Sitemap.

 So if your dataset is not listed yet and you want to have it included into
 the next version of the LOD cloud, please add it to CKAN until next
 Wednesday (September 8th, 2010).

 Also, if we have collected wrong information about your dataset or if your
 dataset is only partially described up till now, it would be great if you
 could add the missing information.

 Guidelines about how to add datasets to CKAN as well as about the tags that
 we are using to annotate the datasets are found here:

 http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

 We thank all contributors in advance for their input and help, which
 hopefully will allow us to draw the next version of the LOD cloud as
 accurate as possible.

 Cheers,

 Anja Jentzsch, Richard Cyganiak, Chris Bizer




-- 

Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base http://www.taxonconcept.org/ / GeoSpecies
Knowledge Base http://lod.geospecies.org/
About the GeoSpecies Knowledge Base http://about.geospecies.org/



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-06 Thread Jonathan Gray
(cc'ing ckan-discuss)

Yes - I think the front page used to say:

CKAN is a registry of open data and content packages (and some closed ones)

We should probably revert to something like this wording to avoid
confusion. The main focus of CKAN is, of course, data which is open as
in opendefinition.org as a baseline (though at the OKF we also promote
different standards in different domains - such as pantoprinciples.org
for science). In my opinion a major reason for adding non-open data,
or data where licensing is not clear, is to highlight this to a
broader community of prospective users, to use in combination with
services to clarify legal status (like isitopen.org which Chris
mentioned) and to ultimately to encourage the adoption of an open
license.

Perhaps a good analogy is main, universe and multiverse repositories
in free/open source software package management?

https://help.ubuntu.com/community/Repositories/Ubuntu


On Sunday, September 5, 2010, Kingsley Idehen kide...@openlinksw.com wrote:
  On 9/5/10 11:00 AM, Alan Ruttenberg wrote:

 On Sun, Sep 5, 2010 at 5:08 AM, Chris Bizerch...@bizer.de  wrote:

 Hi Alan,


 I have just spent some time evaluating one source and reported to you
 the result. Perhaps you might act on this investment in time and thank
 me for doing so. You might find that the result was myself and more
 people doing such quality control.

 Sorry that my reply yesterday might have been a bit too harsh.

 I have looked up the CAS license (http://www.cas.org/legal/infopolicy.html)
 and added a reference to the description of the CAS dataset at

 http://ckan.net/package/bio2rdf-cas

 Please also note that CKAN provides a rating function for the datasets and
 also provides for commenting and discussing the datasets.

 Maybe people could use these features as a start to collect quality-related
 meta-information about the datasets.

 CKAN also provides a link to the http://www.isitopendata.org/ service, which
 might be used for license inquiries.

 Dear Chris,

 As I said, the first line on the CKAN home page says: CKAN is a
 registry of open data and content packages.. Therefore I think there
 is a reasonable expectation that the packages registered there are
 open. I maintain that CKAN should either change how it explains itself
 to make clear that it is a registry of packages that may or may not be
 open, or it should remove the packages that are not known to be open.
 I'm not taking a position one way or another which they should do
 (that's their business), but they should say what they do, and do what
 they say.

 Thank you for your pointers to further information on how to find
 licenses. I'm fairly familiar with this area given that I work for
 Creative Commons.


 Chris,

 The critical point here is that CKAN should simply make the correct Alan is 
 suggesting. As you know, we don't need misleading headlines in the LOD realm, 
 it ultimately causes problems.

 Anyway, this is maybe more of a CKAN issue, so I am hoping that Jonathan is 
 reading this thread and takes this as a cue to fix the title, that's all. 
 Basically, this is about publicly available structured data that may or may 
 not be Open. Basically, making something available to the public still 
 doesn't imply that it's actually Open etc..


 I think we can fix this little issue.

 Kingsley


 I agree with you that the quality of Linked Data published on the Web is
 crucial, but we also have to take into account that much of the data in the
 LOD cloud is currently still published by research projects in order to
 demonstrate the technologies.

 As the Web of Data is evolving and more and more actual owners of the
 datasets start to provide them as Linked Data, I hope that the quality will
 also increase and the datasets will be keep current. Encouraging
 developments into this direction currently happen in the libraries,
 eGovernment, and eCommerce domains.

 I agree that these are good examples. I would suggest that you focus
 on including the good examples in the LOD cloud, or at a minimum
 remove those, like CAS, that fall below the minimal standard of
 supplying *some* data and being *open*, so that linked open data
 means something coherent.


 On the other hand, the Web is an open system and we will thus always see
 people publishing low-quality, wrong and misleading data. Google handles
 this fact rather successfully using PageRank. As the Web of Data provides
 more structure then the classic Web, I think we might even be able to apply
 more sophisticated data-quality assessment heuristics to decide which data
 we want to use in our applications and which to ignore. Some of these
 methods are listed in [1].

 Look, Chris, I just did a manual page rank on the CAS dataset. It is
 meaningless.  This is a high quality assessment. If the movement can't
 act on known good quality information I (and others) will doubt that
 automatic algorithms will be credible.

 Moreover, the LOD cloud diagram is an 

Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-06 Thread Yves Raimond
Hello!

On Thu, Sep 2, 2010 at 7:10 PM, Anja Jentzsch a...@anjeve.de wrote:
 Hi all,

 we are in the process of drawing the next version of the LOD cloud diagram.
 This time it is likely to contain around 180 datasets altogether having a
 size of around 20 billion RDF triples.

 For drawing the next version of the LOD cloud, we have started to collect
 meta-information about the datasets to be included on CKAN, a registry of
 open data and content packages provided by the Open Knowledge Foundation.

 The list of datasets about which we have already collected information is be
 found here:

 http://www4.wiwiss.fu-berlin.de/lodcloud/

I just spotted that BBC Music was missing from this list (Nick Humfrey
will be able to give more detailed stats about it, if needed).
Also, BBC Programmes is around 60M triples now.

Best,
y


 In addition to basic meta-information about a dataset such as its size and
 the number of links pointing at other datasets, we also collect additional
 meta-information about the license of the dataset, alternative access
 options like SPARQL endpoints or dataset dumps, and whether there exist a
 voiD description of the dataset or a Semantic Web Sitemap.

 So if your dataset is not listed yet and you want to have it included into
 the next version of the LOD cloud, please add it to CKAN until next
 Wednesday (September 8th, 2010).

 Also, if we have collected wrong information about your dataset or if your
 dataset is only partially described up till now, it would be great if you
 could add the missing information.

 Guidelines about how to add datasets to CKAN as well as about the tags that
 we are using to annotate the datasets are found here:
 http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

 We thank all contributors in advance for their input and help, which
 hopefully will allow us to draw the next version of the LOD cloud as
 accurate as possible.

 Cheers,

 Anja Jentzsch, Richard Cyganiak, Chris Bizer





AW: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-05 Thread Chris Bizer
Hi Alan,

 I have just spent some time evaluating one source and reported to you 
 the result. Perhaps you might act on this investment in time and thank 
 me for doing so. You might find that the result was myself and more 
 people doing such quality control.

Sorry that my reply yesterday might have been a bit too harsh.

I have looked up the CAS license (http://www.cas.org/legal/infopolicy.html)
and added a reference to the description of the CAS dataset at

http://ckan.net/package/bio2rdf-cas

Please also note that CKAN provides a rating function for the datasets and
also provides for commenting and discussing the datasets.

Maybe people could use these features as a start to collect quality-related
meta-information about the datasets.

CKAN also provides a link to the http://www.isitopendata.org/ service, which
might be used for license inquiries.

I agree with you that the quality of Linked Data published on the Web is
crucial, but we also have to take into account that much of the data in the
LOD cloud is currently still published by research projects in order to
demonstrate the technologies.

As the Web of Data is evolving and more and more actual owners of the
datasets start to provide them as Linked Data, I hope that the quality will
also increase and the datasets will be keep current. Encouraging
developments into this direction currently happen in the libraries,
eGovernment, and eCommerce domains. 

On the other hand, the Web is an open system and we will thus always see
people publishing low-quality, wrong and misleading data. Google handles
this fact rather successfully using PageRank. As the Web of Data provides
more structure then the classic Web, I think we might even be able to apply
more sophisticated data-quality assessment heuristics to decide which data
we want to use in our applications and which to ignore. Some of these
methods are listed in [1].

Best, 

Chris 

[1] Christian Bizer, Richard Cyganiak: Quality-driven information filtering
using the WIQA policy framework. Journal of Web Semantics: Science, Services
and Agents on the World Wide Web, Volume 7, Issue 1, January 2009, Pages
1-10.
http://dx.doi.org/10.1016/j.websem.2008.02.005


-Ursprüngliche Nachricht-
Von: Alan Ruttenberg [mailto:alanruttenb...@gmail.com] 
Gesendet: Samstag, 4. September 2010 22:20
An: Chris Bizer
Cc: Anja Jentzsch; public-lod@w3.org; Leigh Dodds; Jonathan Gray
Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so
that your dataset is included.

On Sat, Sep 4, 2010 at 3:43 PM, Chris Bizer ch...@bizer.de wrote:
 So rather than to criticize the work that other people do on collecting
 meta-information about the datasets in the LOD cloud

Did you read what I wrote? I made no comment on the adequacy of
metainformation. In fact I *used* that metainformation to point out
that the data source in question did not satisfy the open provision
of linked *open* data. In addition I criticized the *inclusion* of the
data set in the *lod cloud diagram* because of this lack of openness
and because the actual content of that resource didn't resemble any
data in the resource that it was derived from (a registry of
information about chemical compounds), suggesting that it would hurt
the LOD effort as inclusion would be a kind of false advertising.

-Alan




AW: AW: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-05 Thread Chris Bizer
Hi Tim,

 Swoogle has such metadata for the documents it has indexed.  
 Perhaps we can extract and publish statistics for the key LOD datasets.

This would be great!

Chris

-Ursprüngliche Nachricht-
Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im Auftrag
von Tim Finin
Gesendet: Samstag, 4. September 2010 22:19
An: public-lod@w3.org
Betreff: Re: AW: Next version of the LOD cloud diagram. Please provide
input, so that your dataset is included.

On 9/4/10 4:01 PM, Chris Bizer wrote:
 But I guess there are also limits to the meta-data that people can gather
 manually. So the best would be if somebody would run a crawler and extract
 meta-data about vocabulary usage and other usage pattern directly from the
 LOD datasets. Nobody has done this yet but hopefully somebody will soon
 start doing this.

Swoogle has such metadata for the documents it has indexed.  Perhaps we
can extract and publish statistics for the key LOD datasets.





Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-05 Thread Hugh Glaser
Wow Anja, that is so cool that you guys have done all that stuff on CKAN for
us - many thanks.

In reviewing as requested, I have a question please:

I have some datasets that appear in
Datasets that require more detailed information
such as http://ckan.net/package/rkb-explorer-jisc, but I can't see the
difference between that and something like
http://ckan.net/package/rkb-explorer-ieee which is in the
Datasets in the next LOD Cloud
Section.
Can you give me a hint of what needs to be done to fix please?

Best
Hugh

On 02/09/2010 19:10, Anja Jentzsch a...@anjeve.de wrote:

 Hi all,
 
 we are in the process of drawing the next version of the LOD cloud
 diagram. This time it is likely to contain around 180 datasets
 altogether having a size of around 20 billion RDF triples.
 
 For drawing the next version of the LOD cloud, we have started to
 collect meta-information about the datasets to be included on CKAN, a
 registry of open data and content packages provided by the Open
 Knowledge Foundation.
 
 The list of datasets about which we have already collected information
 is be found here:
 
 http://www4.wiwiss.fu-berlin.de/lodcloud/
 
 In addition to basic meta-information about a dataset such as its size
 and the number of links pointing at other datasets, we also collect
 additional meta-information about the license of the dataset,
 alternative access options like SPARQL endpoints or dataset dumps, and
 whether there exist a voiD description of the dataset or a Semantic Web
 Sitemap.
 
 So if your dataset is not listed yet and you want to have it included
 into the next version of the LOD cloud, please add it to CKAN until next
 Wednesday (September 8th, 2010).
 
 Also, if we have collected wrong information about your dataset or if
 your dataset is only partially described up till now, it would be great
 if you could add the missing information.
 
 Guidelines about how to add datasets to CKAN as well as about the tags
 that we are using to annotate the datasets are found here:
 http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANme
 tainformation
 
 We thank all contributors in advance for their input and help, which
 hopefully will allow us to draw the next version of the LOD cloud as
 accurate as possible.
 
 Cheers,
 
 Anja Jentzsch, Richard Cyganiak, Chris Bizer
 




Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-05 Thread Alan Ruttenberg
On Sun, Sep 5, 2010 at 5:08 AM, Chris Bizer ch...@bizer.de wrote:
 Hi Alan,

 I have just spent some time evaluating one source and reported to you
 the result. Perhaps you might act on this investment in time and thank
 me for doing so. You might find that the result was myself and more
 people doing such quality control.

 Sorry that my reply yesterday might have been a bit too harsh.

 I have looked up the CAS license (http://www.cas.org/legal/infopolicy.html)
 and added a reference to the description of the CAS dataset at

 http://ckan.net/package/bio2rdf-cas

 Please also note that CKAN provides a rating function for the datasets and
 also provides for commenting and discussing the datasets.

 Maybe people could use these features as a start to collect quality-related
 meta-information about the datasets.

 CKAN also provides a link to the http://www.isitopendata.org/ service, which
 might be used for license inquiries.

Dear Chris,

As I said, the first line on the CKAN home page says: CKAN is a
registry of open data and content packages.. Therefore I think there
is a reasonable expectation that the packages registered there are
open. I maintain that CKAN should either change how it explains itself
to make clear that it is a registry of packages that may or may not be
open, or it should remove the packages that are not known to be open.
I'm not taking a position one way or another which they should do
(that's their business), but they should say what they do, and do what
they say.

Thank you for your pointers to further information on how to find
licenses. I'm fairly familiar with this area given that I work for
Creative Commons.

 I agree with you that the quality of Linked Data published on the Web is
 crucial, but we also have to take into account that much of the data in the
 LOD cloud is currently still published by research projects in order to
 demonstrate the technologies.

 As the Web of Data is evolving and more and more actual owners of the
 datasets start to provide them as Linked Data, I hope that the quality will
 also increase and the datasets will be keep current. Encouraging
 developments into this direction currently happen in the libraries,
 eGovernment, and eCommerce domains.

I agree that these are good examples. I would suggest that you focus
on including the good examples in the LOD cloud, or at a minimum
remove those, like CAS, that fall below the minimal standard of
supplying *some* data and being *open*, so that linked open data
means something coherent.

 On the other hand, the Web is an open system and we will thus always see
 people publishing low-quality, wrong and misleading data. Google handles
 this fact rather successfully using PageRank. As the Web of Data provides
 more structure then the classic Web, I think we might even be able to apply
 more sophisticated data-quality assessment heuristics to decide which data
 we want to use in our applications and which to ignore. Some of these
 methods are listed in [1].

Look, Chris, I just did a manual page rank on the CAS dataset. It is
meaningless.  This is a high quality assessment. If the movement can't
act on known good quality information I (and others) will doubt that
automatic algorithms will be credible.

Moreover, the LOD cloud diagram is an advertisement. There are enough
data sets now that inclusion in the diagram can become a reward for
good work. It's not good advertising for Google when junk sites come
up at the top of search results and they do their best to minimize
this occurrence. The LOD cloud is your front page, and to a certain
extent mine as well as I invest all my time in doing work towards
building the web of data in the Sciences.

Regards,
Alan


 Best,

 Chris

 [1] Christian Bizer, Richard Cyganiak: Quality-driven information filtering
 using the WIQA policy framework. Journal of Web Semantics: Science, Services
 and Agents on the World Wide Web, Volume 7, Issue 1, January 2009, Pages
 1-10.
 http://dx.doi.org/10.1016/j.websem.2008.02.005


 -Ursprüngliche Nachricht-
 Von: Alan Ruttenberg [mailto:alanruttenb...@gmail.com]
 Gesendet: Samstag, 4. September 2010 22:20
 An: Chris Bizer
 Cc: Anja Jentzsch; public-lod@w3.org; Leigh Dodds; Jonathan Gray
 Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so
 that your dataset is included.

 On Sat, Sep 4, 2010 at 3:43 PM, Chris Bizer ch...@bizer.de wrote:
 So rather than to criticize the work that other people do on collecting
 meta-information about the datasets in the LOD cloud

 Did you read what I wrote? I made no comment on the adequacy of
 metainformation. In fact I *used* that metainformation to point out
 that the data source in question did not satisfy the open provision
 of linked *open* data. In addition I criticized the *inclusion* of the
 data set in the *lod cloud diagram* because of this lack of openness
 and because the actual content of that resource didn't resemble any
 data in the resource that it was 

Re: AW: AW: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-05 Thread Kingsley Idehen

 On 9/5/10 5:08 AM, Chris Bizer wrote:

Hi Tim,


Swoogle has such metadata for the documents it has indexed.
Perhaps we can extract and publish statistics for the key LOD datasets.

This would be great!


Tim,

Is the metadata (in total) from Swoogle available to the public in RDF 
form? If so, a resource link would be great.


Kingsley

Chris

-Ursprüngliche Nachricht-
Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im Auftrag
von Tim Finin
Gesendet: Samstag, 4. September 2010 22:19
An: public-lod@w3.org
Betreff: Re: AW: Next version of the LOD cloud diagram. Please provide
input, so that your dataset is included.

On 9/4/10 4:01 PM, Chris Bizer wrote:

But I guess there are also limits to the meta-data that people can gather
manually. So the best would be if somebody would run a crawler and extract
meta-data about vocabulary usage and other usage pattern directly from the
LOD datasets. Nobody has done this yet but hopefully somebody will soon
start doing this.

Swoogle has such metadata for the documents it has indexed.  Perhaps we
can extract and publish statistics for the key LOD datasets.







--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-05 Thread Kingsley Idehen

 On 9/5/10 11:00 AM, Alan Ruttenberg wrote:

On Sun, Sep 5, 2010 at 5:08 AM, Chris Bizerch...@bizer.de  wrote:

Hi Alan,


I have just spent some time evaluating one source and reported to you
the result. Perhaps you might act on this investment in time and thank
me for doing so. You might find that the result was myself and more
people doing such quality control.

Sorry that my reply yesterday might have been a bit too harsh.

I have looked up the CAS license (http://www.cas.org/legal/infopolicy.html)
and added a reference to the description of the CAS dataset at

http://ckan.net/package/bio2rdf-cas

Please also note that CKAN provides a rating function for the datasets and
also provides for commenting and discussing the datasets.

Maybe people could use these features as a start to collect quality-related
meta-information about the datasets.

CKAN also provides a link to the http://www.isitopendata.org/ service, which
might be used for license inquiries.

Dear Chris,

As I said, the first line on the CKAN home page says: CKAN is a
registry of open data and content packages.. Therefore I think there
is a reasonable expectation that the packages registered there are
open. I maintain that CKAN should either change how it explains itself
to make clear that it is a registry of packages that may or may not be
open, or it should remove the packages that are not known to be open.
I'm not taking a position one way or another which they should do
(that's their business), but they should say what they do, and do what
they say.

Thank you for your pointers to further information on how to find
licenses. I'm fairly familiar with this area given that I work for
Creative Commons.


Chris,

The critical point here is that CKAN should simply make the correct Alan 
is suggesting. As you know, we don't need misleading headlines in the 
LOD realm, it ultimately causes problems.


Anyway, this is maybe more of a CKAN issue, so I am hoping that Jonathan 
is reading this thread and takes this as a cue to fix the title, that's 
all. Basically, this is about publicly available structured data that 
may or may not be Open. Basically, making something available to the 
public still doesn't imply that it's actually Open etc..



I think we can fix this little issue.

Kingsley


I agree with you that the quality of Linked Data published on the Web is
crucial, but we also have to take into account that much of the data in the
LOD cloud is currently still published by research projects in order to
demonstrate the technologies.

As the Web of Data is evolving and more and more actual owners of the
datasets start to provide them as Linked Data, I hope that the quality will
also increase and the datasets will be keep current. Encouraging
developments into this direction currently happen in the libraries,
eGovernment, and eCommerce domains.

I agree that these are good examples. I would suggest that you focus
on including the good examples in the LOD cloud, or at a minimum
remove those, like CAS, that fall below the minimal standard of
supplying *some* data and being *open*, so that linked open data
means something coherent.


On the other hand, the Web is an open system and we will thus always see
people publishing low-quality, wrong and misleading data. Google handles
this fact rather successfully using PageRank. As the Web of Data provides
more structure then the classic Web, I think we might even be able to apply
more sophisticated data-quality assessment heuristics to decide which data
we want to use in our applications and which to ignore. Some of these
methods are listed in [1].

Look, Chris, I just did a manual page rank on the CAS dataset. It is
meaningless.  This is a high quality assessment. If the movement can't
act on known good quality information I (and others) will doubt that
automatic algorithms will be credible.

Moreover, the LOD cloud diagram is an advertisement. There are enough
data sets now that inclusion in the diagram can become a reward for
good work. It's not good advertising for Google when junk sites come
up at the top of search results and they do their best to minimize
this occurrence. The LOD cloud is your front page, and to a certain
extent mine as well as I invest all my time in doing work towards
building the web of data in the Sciences.

Regards,
Alan


Best,

Chris

[1] Christian Bizer, Richard Cyganiak: Quality-driven information filtering
using the WIQA policy framework. Journal of Web Semantics: Science, Services
and Agents on the World Wide Web, Volume 7, Issue 1, January 2009, Pages
1-10.
http://dx.doi.org/10.1016/j.websem.2008.02.005


-Ursprüngliche Nachricht-
Von: Alan Ruttenberg [mailto:alanruttenb...@gmail.com]
Gesendet: Samstag, 4. September 2010 22:20
An: Chris Bizer
Cc: Anja Jentzsch; public-lod@w3.org; Leigh Dodds; Jonathan Gray
Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so
that your dataset is included.

On Sat, Sep 4, 2010 at 3:43 PM, Chris 

Contd: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-05 Thread Kingsley Idehen

 On 9/5/10 11:00 AM, Alan Ruttenberg wrote:

On Sun, Sep 5, 2010 at 5:08 AM, Chris Bizerch...@bizer.de  wrote:

Hi Alan,


I have just spent some time evaluating one source and reported to you
the result. Perhaps you might act on this investment in time and thank
me for doing so. You might find that the result was myself and more
people doing such quality control.

Sorry that my reply yesterday might have been a bit too harsh.

I have looked up the CAS license (http://www.cas.org/legal/infopolicy.html)
and added a reference to the description of the CAS dataset at

http://ckan.net/package/bio2rdf-cas

Please also note that CKAN provides a rating function for the datasets and
also provides for commenting and discussing the datasets.

Maybe people could use these features as a start to collect quality-related
meta-information about the datasets.

CKAN also provides a link to the http://www.isitopendata.org/ service, which
might be used for license inquiries.

Dear Chris,

As I said, the first line on the CKAN home page says: CKAN is a
registry of open data and content packages.. Therefore I think there
is a reasonable expectation that the packages registered there are
open. I maintain that CKAN should either change how it explains itself
to make clear that it is a registry of packages that may or may not be
open, or it should remove the packages that are not known to be open.
I'm not taking a position one way or another which they should do
(that's their business), but they should say what they do, and do what
they say.

Thank you for your pointers to further information on how to find
licenses. I'm fairly familiar with this area given that I work for
Creative Commons.


I agree with you that the quality of Linked Data published on the Web is
crucial, but we also have to take into account that much of the data in the
LOD cloud is currently still published by research projects in order to
demonstrate the technologies.

As the Web of Data is evolving and more and more actual owners of the
datasets start to provide them as Linked Data, I hope that the quality will
also increase and the datasets will be keep current. Encouraging
developments into this direction currently happen in the libraries,
eGovernment, and eCommerce domains.

I agree that these are good examples. I would suggest that you focus
on including the good examples in the LOD cloud, or at a minimum
remove those, like CAS, that fall below the minimal standard of
supplying *some* data and being *open*, so that linked open data
means something coherent.


On the other hand, the Web is an open system and we will thus always see
people publishing low-quality, wrong and misleading data. Google handles
this fact rather successfully using PageRank. As the Web of Data provides
more structure then the classic Web, I think we might even be able to apply
more sophisticated data-quality assessment heuristics to decide which data
we want to use in our applications and which to ignore. Some of these
methods are listed in [1].

Look, Chris, I just did a manual page rank on the CAS dataset. It is
meaningless.  This is a high quality assessment. If the movement can't
act on known good quality information I (and others) will doubt that
automatic algorithms will be credible.

Moreover, the LOD cloud diagram is an advertisement. There are enough
data sets now that inclusion in the diagram can become a reward for
good work. It's not good advertising for Google when junk sites come
up at the top of search results and they do their best to minimize
this occurrence. The LOD cloud is your front page, and to a certain
extent mine as well as I invest all my time in doing work towards
building the web of data in the Sciences.

Regards,
Alan


Best,

Chris

[1] Christian Bizer, Richard Cyganiak: Quality-driven information filtering
using the WIQA policy framework. Journal of Web Semantics: Science, Services
and Agents on the World Wide Web, Volume 7, Issue 1, January 2009, Pages
1-10.
http://dx.doi.org/10.1016/j.websem.2008.02.005


-Ursprüngliche Nachricht-
Von: Alan Ruttenberg [mailto:alanruttenb...@gmail.com]
Gesendet: Samstag, 4. September 2010 22:20
An: Chris Bizer
Cc: Anja Jentzsch; public-lod@w3.org; Leigh Dodds; Jonathan Gray
Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so
that your dataset is included.

On Sat, Sep 4, 2010 at 3:43 PM, Chris Bizerch...@bizer.de  wrote:

So rather than to criticize the work that other people do on collecting
meta-information about the datasets in the LOD cloud

Did you read what I wrote? I made no comment on the adequacy of
metainformation. In fact I *used* that metainformation to point out
that the data source in question did not satisfy the open provision
of linked *open* data. In addition I criticized the *inclusion* of the
data set in the *lod cloud diagram* because of this lack of openness
and because the actual content of that resource didn't resemble any
data in the resource that it was 

Re: Contd: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-05 Thread Alan Ruttenberg
On Sun, Sep 5, 2010 at 12:25 PM, Kingsley Idehen kide...@openlinksw.com wrote:

 All,

 See: http://www.ckan.net/group/lod

 Why: Linking Open Data?

 It should be: Linked Open Data . Or just: Linked Data (bearing in all the 
 data sets might not be open).

The name and description goes back the SWEO interest group, which was
called that. If you take the open out of it (which I would oppose)
then I don't think you should reference the interest group, since
their interest was in open.

http://esw.w3.org/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

-Alan



Re: Contd: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-05 Thread Kingsley Idehen

 On 9/5/10 6:30 PM, Alan Ruttenberg wrote:

On Sun, Sep 5, 2010 at 12:25 PM, Kingsley Idehenkide...@openlinksw.com  wrote:

All,

See: http://www.ckan.net/group/lod

Why: Linking Open Data?

It should be: Linked Open Data . Or just: Linked Data (bearing in all the data 
sets might not be open).

The name and description goes back the SWEO interest group, which was
called that. If you take the open out of it (which I would oppose)
then I don't think you should reference the interest group, since
their interest was in open.

http://esw.w3.org/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

-Alan



Alan,

It's the Linking rather than Linked aspect I am concerned about.

LOD should stand for: Linked Open Data, not Linking Open Data.

Like most technical projects, Linking Open Data is a misnomer :-)

--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen







Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-04 Thread Dan Brickley
On Thu, Sep 2, 2010 at 8:10 PM, Anja Jentzsch a...@anjeve.de wrote:
 Hi all,

 we are in the process of drawing the next version of the LOD cloud diagram.
 This time it is likely to contain around 180 datasets altogether having a
 size of around 20 billion RDF triples.

 For drawing the next version of the LOD cloud, we have started to collect
 meta-information about the datasets to be included on CKAN, a registry of
 open data and content packages provided by the Open Knowledge Foundation.

 The list of datasets about which we have already collected information is be
 found here:

 http://www4.wiwiss.fu-berlin.de/lodcloud/

 In addition to basic meta-information about a dataset such as its size and
 the number of links pointing at other datasets, we also collect additional
 meta-information about the license of the dataset, alternative access
 options like SPARQL endpoints or dataset dumps, and whether there exist a
 voiD description of the dataset or a Semantic Web Sitemap.

 So if your dataset is not listed yet and you want to have it included into
 the next version of the LOD cloud, please add it to CKAN until next
 Wednesday (September 8th, 2010).

 Also, if we have collected wrong information about your dataset or if your
 dataset is only partially described up till now, it would be great if you
 could add the missing information.

 Guidelines about how to add datasets to CKAN as well as about the tags that
 we are using to annotate the datasets are found here:
 http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

 We thank all contributors in advance for their input and help, which
 hopefully will allow us to draw the next version of the LOD cloud as
 accurate as possible.

This is great! Glad to see this being updated :)

One thing I would love in the next revision is for FOAF to also be
presented as a vocabulary, rather than as if it were itself a distinct
dataset. While there are databases that expose as FOAF (LiveJournal
etc.), and also a reasonable number of independently published 'FOAF
files', the technical core of FOAF is really the vocabulary and the
habit of linking things together. Having a FOAF 'blob' is great and
all, but it doesn't help people understand that FOAF is used as a
vocabulary by various of the other blobs too. And beyond FOAF, I'm
wondering how we can visually represent the use of eg. Music Ontology,
or Dublin Core, or Creative Commons vocabularies across different
regions of the cloud. Maybe (later :) someone could make a view where
each blob is a pie-chart showing which vocabularies it uses?

As a vocabulary manager, it is pretty hard to understand the costs and
benefits of possible changes to a widely deployed RDF vocabulary. I'm
sure I'm not alone in this; Tom (cc:'d) I expect would vouch the same
regarding the Dublin Core terms. So if there could be some view of the
new cloud diagram that showed us which blobs (er, datasets) used which
vocabulary (and which terms), that would be really wonderful. On the
Dublin Core side, it would be fascinating to see which datasets are
using http://purl.org/dc/elements/1.1/ and which are using
http://purl.org/dc/terms/ (and which are using both). Similarly with
FOAF, I'd like to understand common deployment patterns better.  I
expect other vocab managers and dataset publishersare in a similar
situation, and would appreciate a map of the wider territory, so they
know how to fit in with trends and conventions, or what missing pieces
of vocabulary might need more work...

Thanks for any thoughts,

Dan



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-04 Thread Anja Jentzsch
Hi Alan,

CKAN is a repository for all kinds of datasets. Even if datasets are not open 
or only for non-commercial use, they can be listed and information on licensing 
can be noted (Other - Closed, e.g.). This is still a valuable information.
If no license is specified or we did not find the license information, CKAN 
lists the datasets as not open.
Leigh Dodds had a closer look at the licenses of the LOD datasets some time ago 
[1]. It is sad but true that only about 23% of all datasets come along with a 
clearly defined license.

Hopefully data publishers will more clearly state the licenses along with their 
datasets to encourage people to use their data.

Cheers,
Anja

[1] 
http://iswc2009.semanticweb.org/wiki/index.php/ISWC_2009_Tutorials/Legal_and_Social_Frameworks_for_Sharing_Data_on_the_Web#Slides

On 03.09.2010 20:43, Alan Ruttenberg wrote:
 I think you should consider having some better quality control and
 standards around this, as I feel it is somewhat misleading. For
 example (and this is one of several), consider CAS which is named in
 the diagram. I don't consider the contents of that set to include any
 data. Here is an example:
 
 http://cu.bio2rdf.org/cas:921-60-8
 
 Subject   
 http://bio2rdf.org/cas:921-60-8
 
 Predicate Object
 http://bio2rdf.org/bio2rdf_resource:url   
 http://bio2rdf.org/html/cas:921-60-8
 (Non-RDF URI)
 http://www.w3.org/2002/07/owl#sameAs  http://cas.bio2rdf.org/cas:921-60-8
 (External link)
 
 This is content free.
 
 In addition, the documentation of that set says it is not open:
 http://ckan.net/package/bio2rdf-cas
 
 Although this URI might be used to link somehow, in my opinion it is
 misleading to call this collection a linked open *data* set. Further,
 including it will do damage to LOD reputation if anyone actually looks
 past that diagram to see what is really there.
 
 Sincerely,
 
 Alan Ruttenberg
 
 
 On Fri, Sep 3, 2010 at 2:00 PM, Jonathan Grayjonathan.g...@okfn.org  wrote:
 FYI, we blogged this here:
 
  
 http://blog.okfn.org/2010/09/03/next-version-of-the-linked-open-data-cloud-based-on-ckan/
 
 All are, of course, most welcone to join ckan-discuss list if there
 are any specific suggestions for features we should add:
 
  http://lists.okfn.org/mailman/listinfo/ckan-discuss
 
 We will be continuing to develop CKAN's support for LOD/semantic web
 technologies over the coming months (and years)! ;-)
 
 On Fri, Sep 3, 2010 at 5:03 PM, Leigh Doddsleigh.do...@talis.com  wrote:
 Hi Chris, Anja
 
 On 3 September 2010 15:17, Chris Bizerch...@bizer.de  wrote:
 In theory, the list is automatically updated with data from CKAN.
 
 But as the CKAN server is overloaded today, the list is currently corrupted
 and only shows a fraction of the datasets.
 
 We hope that the issue is solved in the next hours!
 
 Thanks for the confirmation!
 
 Cheers,
 
 L.
 
 --
 Leigh Dodds
 Programme Manager, Talis Platform
 Talis
 leigh.do...@talis.com
 http://www.talis.com
 
 
 
 
 
 --
 Jonathan Gray
 
 Community Coordinator
 The Open Knowledge Foundation
 http://blog.okfn.org
 
 http://twitter.com/jwyg
 http://identi.ca/jwyg
 
 




Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-04 Thread Thomas Baker
On Sat, Sep 04, 2010 at 11:32:59AM +0200, Dan Brickley wrote:
 As a vocabulary manager, it is pretty hard to understand the costs and
 benefits of possible changes to a widely deployed RDF vocabulary. I'm
 sure I'm not alone in this; Tom (cc:'d) I expect would vouch the same
 regarding the Dublin Core terms. So if there could be some view of the
 new cloud diagram that showed us which blobs (er, datasets) used which
 vocabulary (and which terms), that would be really wonderful. On the
 Dublin Core side, it would be fascinating to see which datasets are
 using http://purl.org/dc/elements/1.1/ and which are using
 http://purl.org/dc/terms/ (and which are using both). Similarly with
 FOAF, I'd like to understand common deployment patterns better.  I
 expect other vocab managers and dataset publishersare in a similar
 situation, and would appreciate a map of the wider territory, so they
 know how to fit in with trends and conventions, or what missing pieces
 of vocabulary might need more work...

+1 

Data on common deployment patterns would be great!

If connecting vocabularies with specific datasets were
visually too ambitious, then as a first approximation there
could perhaps be a separate vocabulary cloud reflecting the
relative numbers of triples using the various vocabularies,
along with links reflecting the co-occurrence of specific
vocabularies with others within datasets -- a broad-brush,
aggregate view of vocabulary deployment patterns.

-- 
Tom Baker tba...@tbaker.de



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-04 Thread Alan Ruttenberg
On Sat, Sep 4, 2010 at 8:35 AM, Anja Jentzsch a...@anjeve.de wrote:
 Hi Alan,

 CKAN is a repository for all kinds of datasets. Even if datasets are not open 
 or only for non-commercial use, they can be listed and information on 
 licensing can be noted (Other - Closed, e.g.). This is still a valuable 
 information.

Hello Anja,

My comment was not a commentary on CKAN, it was a comment on specific
data set and it's relation to the LOD cloud - please have a closer
read.

However, now that you mention it, the opening line on the CKAN website
says: CKAN is a registry of open data and content packages. The
words open data and content are linked to
http://www.opendefinition.org/ which explains what open means (it does
not mean closed).

So one of two things should be fixed with CKAN - either the statement
on the front page should be changed to make it clear that it also
registers closed data, or the closed data entries should be expunged.

 If no license is specified or we did not find the license information, CKAN 
 lists the datasets as not open.

Same comment re: having CKAN present a consistent view of what it does.

 Leigh Dodds had a closer look at the licenses of the LOD datasets some time 
 ago [1]. It is sad but true that only about 23% of all datasets come along 
 with a clearly defined license.

Yes, unfortunate. A similar audit should be done for the sets that are
named on the LOD (also open) cloud.

 Hopefully data publishers will more clearly state the licenses along with 
 their datasets to encourage people to use their data.

Here we agree, and part of my work is doing exactly that.

Regards,
Alan


 Cheers,
 Anja

 [1] 
 http://iswc2009.semanticweb.org/wiki/index.php/ISWC_2009_Tutorials/Legal_and_Social_Frameworks_for_Sharing_Data_on_the_Web#Slides

 On 03.09.2010 20:43, Alan Ruttenberg wrote:
 I think you should consider having some better quality control and
 standards around this, as I feel it is somewhat misleading. For
 example (and this is one of several), consider CAS which is named in
 the diagram. I don't consider the contents of that set to include any
 data. Here is an example:

 http://cu.bio2rdf.org/cas:921-60-8

 Subject
 http://bio2rdf.org/cas:921-60-8

 Predicate     Object
 http://bio2rdf.org/bio2rdf_resource:url       
 http://bio2rdf.org/html/cas:921-60-8
 (Non-RDF URI)
 http://www.w3.org/2002/07/owl#sameAs  http://cas.bio2rdf.org/cas:921-60-8
 (External link)

 This is content free.

 In addition, the documentation of that set says it is not open:
 http://ckan.net/package/bio2rdf-cas

 Although this URI might be used to link somehow, in my opinion it is
 misleading to call this collection a linked open *data* set. Further,
 including it will do damage to LOD reputation if anyone actually looks
 past that diagram to see what is really there.

 Sincerely,

 Alan Ruttenberg


 On Fri, Sep 3, 2010 at 2:00 PM, Jonathan Grayjonathan.g...@okfn.org  wrote:
 FYI, we blogged this here:

  http://blog.okfn.org/2010/09/03/next-version-of-the-linked-open-data-cloud-based-on-ckan/

 All are, of course, most welcone to join ckan-discuss list if there
 are any specific suggestions for features we should add:

  http://lists.okfn.org/mailman/listinfo/ckan-discuss

 We will be continuing to develop CKAN's support for LOD/semantic web
 technologies over the coming months (and years)! ;-)

 On Fri, Sep 3, 2010 at 5:03 PM, Leigh Doddsleigh.do...@talis.com  wrote:
 Hi Chris, Anja

 On 3 September 2010 15:17, Chris Bizerch...@bizer.de  wrote:
 In theory, the list is automatically updated with data from CKAN.

 But as the CKAN server is overloaded today, the list is currently 
 corrupted
 and only shows a fraction of the datasets.

 We hope that the issue is solved in the next hours!

 Thanks for the confirmation!

 Cheers,

 L.

 --
 Leigh Dodds
 Programme Manager, Talis Platform
 Talis
 leigh.do...@talis.com
 http://www.talis.com





 --
 Jonathan Gray

 Community Coordinator
 The Open Knowledge Foundation
 http://blog.okfn.org

 http://twitter.com/jwyg
 http://identi.ca/jwyg







AW: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-04 Thread Chris Bizer
Hi Alan,

 I think you should consider having some better quality control

and

 Yes, unfortunate. A similar audit should be done for the sets 
 that are named on the LOD (also open) cloud.

LOD is an open community effort to which everybody can contribute.

So rather than to criticize the work that other people do on collecting
meta-information about the datasets in the LOD cloud, you are more than
welcome to quality-control 20 billion triples.

Best,

Chris
  

-Ursprüngliche Nachricht-
Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im Auftrag
von Alan Ruttenberg
Gesendet: Samstag, 4. September 2010 18:47
An: Anja Jentzsch
Cc: public-lod@w3.org; Leigh Dodds; Chris Bizer; Jonathan Gray
Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so
that your dataset is included.

On Sat, Sep 4, 2010 at 8:35 AM, Anja Jentzsch a...@anjeve.de wrote:
 Hi Alan,

 CKAN is a repository for all kinds of datasets. Even if datasets are not
open or only for non-commercial use, they can be listed and information on
licensing can be noted (Other - Closed, e.g.). This is still a valuable
information.

Hello Anja,

My comment was not a commentary on CKAN, it was a comment on specific
data set and it's relation to the LOD cloud - please have a closer
read.

However, now that you mention it, the opening line on the CKAN website
says: CKAN is a registry of open data and content packages. The
words open data and content are linked to
http://www.opendefinition.org/ which explains what open means (it does
not mean closed).

So one of two things should be fixed with CKAN - either the statement
on the front page should be changed to make it clear that it also
registers closed data, or the closed data entries should be expunged.

 If no license is specified or we did not find the license information,
CKAN lists the datasets as not open.

Same comment re: having CKAN present a consistent view of what it does.

 Leigh Dodds had a closer look at the licenses of the LOD datasets some
time ago [1]. It is sad but true that only about 23% of all datasets come
along with a clearly defined license.

Yes, unfortunate. A similar audit should be done for the sets that are
named on the LOD (also open) cloud.

 Hopefully data publishers will more clearly state the licenses along with
their datasets to encourage people to use their data.

Here we agree, and part of my work is doing exactly that.

Regards,
Alan


 Cheers,
 Anja

 [1]
http://iswc2009.semanticweb.org/wiki/index.php/ISWC_2009_Tutorials/Legal_and
_Social_Frameworks_for_Sharing_Data_on_the_Web#Slides

 On 03.09.2010 20:43, Alan Ruttenberg wrote:
 I think you should consider having some better quality control and
 standards around this, as I feel it is somewhat misleading. For
 example (and this is one of several), consider CAS which is named in
 the diagram. I don't consider the contents of that set to include any
 data. Here is an example:

 http://cu.bio2rdf.org/cas:921-60-8

 Subject
 http://bio2rdf.org/cas:921-60-8

 Predicate     Object
 http://bio2rdf.org/bio2rdf_resource:url      
http://bio2rdf.org/html/cas:921-60-8
 (Non-RDF URI)
 http://www.w3.org/2002/07/owl#sameAs  http://cas.bio2rdf.org/cas:921-60-8
 (External link)

 This is content free.

 In addition, the documentation of that set says it is not open:
 http://ckan.net/package/bio2rdf-cas

 Although this URI might be used to link somehow, in my opinion it is
 misleading to call this collection a linked open *data* set. Further,
 including it will do damage to LOD reputation if anyone actually looks
 past that diagram to see what is really there.

 Sincerely,

 Alan Ruttenberg


 On Fri, Sep 3, 2010 at 2:00 PM, Jonathan Grayjonathan.g...@okfn.org
 wrote:
 FYI, we blogged this here:


 http://blog.okfn.org/2010/09/03/next-version-of-the-linked-open-data-cloud-
based-on-ckan/

 All are, of course, most welcone to join ckan-discuss list if there
 are any specific suggestions for features we should add:

  http://lists.okfn.org/mailman/listinfo/ckan-discuss

 We will be continuing to develop CKAN's support for LOD/semantic web
 technologies over the coming months (and years)! ;-)

 On Fri, Sep 3, 2010 at 5:03 PM, Leigh Doddsleigh.do...@talis.com
 wrote:
 Hi Chris, Anja

 On 3 September 2010 15:17, Chris Bizerch...@bizer.de  wrote:
 In theory, the list is automatically updated with data from CKAN.

 But as the CKAN server is overloaded today, the list is currently
corrupted
 and only shows a fraction of the datasets.

 We hope that the issue is solved in the next hours!

 Thanks for the confirmation!

 Cheers,

 L.

 --
 Leigh Dodds
 Programme Manager, Talis Platform
 Talis
 leigh.do...@talis.com
 http://www.talis.com





 --
 Jonathan Gray

 Community Coordinator
 The Open Knowledge Foundation
 http://blog.okfn.org

 http://twitter.com/jwyg
 http://identi.ca/jwyg








AW: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-04 Thread Chris Bizer
Hi Dan,

 
 This is great! Glad to see this being updated :)

 One thing I would love in the next revision is for FOAF to also be
 presented as a vocabulary, rather than as if it were itself a distinct
 dataset. While there are databases that expose as FOAF (LiveJournal
 etc.), and also a reasonable number of independently published 'FOAF
 files', the technical core of FOAF is really the vocabulary and the
 habit of linking things together. Having a FOAF 'blob' is great and
 all, but it doesn't help people understand that FOAF is used as a
 vocabulary by various of the other blobs too. 

Yes, we also felt that having a blob is a bit misleading and were thus
thinking about using a cloud icon for FOAF and SIOC to reflect the fact that
the blob actually consists of many separate files on many different servers.


Beside, we have started to tag datsets in CKAN with the vocabularies that
they use. So, ideally all datasets that use FOAF should be tagged with
format-foaf and people can use this data via the CKAN API to draw any
visualization of the LOD cloud they like.

 And beyond FOAF, I'm
 wondering how we can visually represent the use of eg. Music Ontology,
 or Dublin Core, or Creative Commons vocabularies across different
 regions of the cloud. Maybe (later :) someone could make a view where
 each blob is a pie-chart showing which vocabularies it uses?

Interesting idea. I would also love to see this.

Maybe we can give it a try, otherwise of course everybody is invited to get
the data from the CKAN API and visualize it in any way he thinks is
interesting.

 As a vocabulary manager, it is pretty hard to understand the costs and
 benefits of possible changes to a widely deployed RDF vocabulary. I'm
 sure I'm not alone in this; Tom (cc:'d) I expect would vouch the same
 regarding the Dublin Core terms. So if there could be some view of the
 new cloud diagram that showed us which blobs (er, datasets) used which
 vocabulary (and which terms), that would be really wonderful. On the
 Dublin Core side, it would be fascinating to see which datasets are
 using http://purl.org/dc/elements/1.1/ and which are using
 http://purl.org/dc/terms/ (and which are using both). Similarly with
 FOAF, I'd like to understand common deployment patterns better.  I
 expect other vocab managers and dataset publishersare in a similar
 situation, and would appreciate a map of the wider territory, so they
 know how to fit in with trends and conventions, or what missing pieces
 of vocabulary might need more work...

Yes, having data about usage patterns would be great. 

But I guess there are also limits to the meta-data that people can gather
manually. So the best would be if somebody would run a crawler and extract
meta-data about vocabulary usage and other usage pattern directly from the
LOD datasets. Nobody has done this yet but hopefully somebody will soon
start doing this.

Cheers,

Chris


 Thanks for any thoughts,

 Dan




Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-04 Thread Alan Ruttenberg
On Sat, Sep 4, 2010 at 3:43 PM, Chris Bizer ch...@bizer.de wrote:
 Hi Alan,

 I think you should consider having some better quality control

 and

 Yes, unfortunate. A similar audit should be done for the sets
 that are named on the LOD (also open) cloud.

 LOD is an open community effort to which everybody can contribute.

 So rather than to criticize the work that other people do on collecting
 meta-information about the datasets in the LOD cloud, you are more than
 welcome to quality-control 20 billion triples.

I have just spent some time evaluating one source and reported to you
the result. Perhaps you might act on this investment in time and thank
me for doing so. You might find that the result was myself and more
people doing such quality control.

-Alan


 Best,

 Chris


 -Ursprüngliche Nachricht-
 Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im Auftrag
 von Alan Ruttenberg
 Gesendet: Samstag, 4. September 2010 18:47
 An: Anja Jentzsch
 Cc: public-lod@w3.org; Leigh Dodds; Chris Bizer; Jonathan Gray
 Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so
 that your dataset is included.

 On Sat, Sep 4, 2010 at 8:35 AM, Anja Jentzsch a...@anjeve.de wrote:
 Hi Alan,

 CKAN is a repository for all kinds of datasets. Even if datasets are not
 open or only for non-commercial use, they can be listed and information on
 licensing can be noted (Other - Closed, e.g.). This is still a valuable
 information.

 Hello Anja,

 My comment was not a commentary on CKAN, it was a comment on specific
 data set and it's relation to the LOD cloud - please have a closer
 read.

 However, now that you mention it, the opening line on the CKAN website
 says: CKAN is a registry of open data and content packages. The
 words open data and content are linked to
 http://www.opendefinition.org/ which explains what open means (it does
 not mean closed).

 So one of two things should be fixed with CKAN - either the statement
 on the front page should be changed to make it clear that it also
 registers closed data, or the closed data entries should be expunged.

 If no license is specified or we did not find the license information,
 CKAN lists the datasets as not open.

 Same comment re: having CKAN present a consistent view of what it does.

 Leigh Dodds had a closer look at the licenses of the LOD datasets some
 time ago [1]. It is sad but true that only about 23% of all datasets come
 along with a clearly defined license.

 Yes, unfortunate. A similar audit should be done for the sets that are
 named on the LOD (also open) cloud.

 Hopefully data publishers will more clearly state the licenses along with
 their datasets to encourage people to use their data.

 Here we agree, and part of my work is doing exactly that.

 Regards,
 Alan


 Cheers,
 Anja

 [1]
 http://iswc2009.semanticweb.org/wiki/index.php/ISWC_2009_Tutorials/Legal_and
 _Social_Frameworks_for_Sharing_Data_on_the_Web#Slides

 On 03.09.2010 20:43, Alan Ruttenberg wrote:
 I think you should consider having some better quality control and
 standards around this, as I feel it is somewhat misleading. For
 example (and this is one of several), consider CAS which is named in
 the diagram. I don't consider the contents of that set to include any
 data. Here is an example:

 http://cu.bio2rdf.org/cas:921-60-8

 Subject
 http://bio2rdf.org/cas:921-60-8

 Predicate     Object
 http://bio2rdf.org/bio2rdf_resource:url
 http://bio2rdf.org/html/cas:921-60-8
 (Non-RDF URI)
 http://www.w3.org/2002/07/owl#sameAs  http://cas.bio2rdf.org/cas:921-60-8
 (External link)

 This is content free.

 In addition, the documentation of that set says it is not open:
 http://ckan.net/package/bio2rdf-cas

 Although this URI might be used to link somehow, in my opinion it is
 misleading to call this collection a linked open *data* set. Further,
 including it will do damage to LOD reputation if anyone actually looks
 past that diagram to see what is really there.

 Sincerely,

 Alan Ruttenberg


 On Fri, Sep 3, 2010 at 2:00 PM, Jonathan Grayjonathan.g...@okfn.org
  wrote:
 FYI, we blogged this here:


  http://blog.okfn.org/2010/09/03/next-version-of-the-linked-open-data-cloud-
 based-on-ckan/

 All are, of course, most welcone to join ckan-discuss list if there
 are any specific suggestions for features we should add:

  http://lists.okfn.org/mailman/listinfo/ckan-discuss

 We will be continuing to develop CKAN's support for LOD/semantic web
 technologies over the coming months (and years)! ;-)

 On Fri, Sep 3, 2010 at 5:03 PM, Leigh Doddsleigh.do...@talis.com
  wrote:
 Hi Chris, Anja

 On 3 September 2010 15:17, Chris Bizerch...@bizer.de  wrote:
 In theory, the list is automatically updated with data from CKAN.

 But as the CKAN server is overloaded today, the list is currently
 corrupted
 and only shows a fraction of the datasets.

 We hope that the issue is solved in the next hours!

 Thanks for the confirmation!

 Cheers,

 L.

 --
 Leigh 

Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-04 Thread Alan Ruttenberg
On Sat, Sep 4, 2010 at 3:43 PM, Chris Bizer ch...@bizer.de wrote:
 So rather than to criticize the work that other people do on collecting
 meta-information about the datasets in the LOD cloud

Did you read what I wrote? I made no comment on the adequacy of
metainformation. In fact I *used* that metainformation to point out
that the data source in question did not satisfy the open provision
of linked *open* data. In addition I criticized the *inclusion* of the
data set in the *lod cloud diagram* because of this lack of openness
and because the actual content of that resource didn't resemble any
data in the resource that it was derived from (a registry of
information about chemical compounds), suggesting that it would hurt
the LOD effort as inclusion would be a kind of false advertising.

-Alan



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-03 Thread Chris Bizer
Hi Ted,

 But please ... this time, will there be any effort to make visible
 the clustering within the LOD Cloud?  This seems to me one of the
 best ways to encourage data set publishers to link out -- and that
 *is* important to grow the utility of the *overall* data set.

Hmm yes, maybe we should have a nice tidy-looking version for slides and in
addition a second more educational version ;-)

A good thing about the CKAN collection is that everybody can access the data
via the API and then convert is to a SVG graphic showing any aspect of the
data people are interested in. For instance, I would also like to see an
educational version of the cloud showing which dataset are properly licensed
and which are not.

Best,

Chris


-Ursprüngliche Nachricht-
Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im Auftrag
von Ted Thibodeau Jr
Gesendet: Donnerstag, 2. September 2010 22:42
An: Anja Jentzsch
Cc: public-lod@w3.org
Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so
that your dataset is included.


On Sep 2, 2010, at 02:10 PM, Anja Jentzsch wrote:

 Hi all,
 
 we are in the process of drawing the next version of the LOD cloud
diagram. This time it is likely to contain around 180 datasets altogether
having a size of around 20 billion RDF triples.



Cool!

But please ... this time, will there be any effort to make visible
the clustering within the LOD Cloud?  This seems to me one of the
best ways to encourage data set publishers to link out -- and that
*is* important to grow the utility of the *overall* data set.

To date, the only graphic I've seen which shows just how little
overall interconnectedness there is (was) in the LOD Cloud is my 
own ... which someone has long since removed from display on the 
EWC wiki page with the Cloud-like graphic, and which is certainly 
well outdated, but which is still found here --

   http://virtuoso.openlinksw.com/images/dbpedia-lod-cloud.html

Now, granted, mine didn't make the bubbles into a pretty cloud-like 
overall shape -- but it did reveal that most data sets (e.g., flickr
wrappr, Magnatune, Audioscrobbler) were only connecting to one or 
two others -- and I think that's important to see, just as it is 
to see which data sets several or many others connected to (e.g., 
DBpedia, Geonames, Musicbrainz), and which sets connected out to 
several or many others (e.g., Revyu, Linked MDB, the not-really-a-
data-set cloud of FOAF Profiles)...

Be seeing you,

Ted




--
A: Yes.  http://www.guckes.net/faq/attribution.html
| Q: Are you sure?
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?

Ted Thibodeau, Jr.   //   voice +1-781-273-0900 x32
Evangelism  Support //mailto:tthibod...@openlinksw.com
 //  http://twitter.com/TallTed
OpenLink Software, Inc.  //  http://www.openlinksw.com/
10 Burlington Mall Road, Suite 265, Burlington MA 01803
 http://www.openlinksw.com/weblogs/uda/
OpenLink Blogs  http://www.openlinksw.com/weblogs/virtuoso/
   http://www.openlinksw.com/blog/~kidehen/
Universal Data Access and Virtual Database Technology Providers








Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-03 Thread Christophe Guéret

 Hi Ted,


But please ... this time, will there be any effort to make visible
the clustering within the LOD Cloud?  This seems to me one of the
best ways to encourage data set publishers to link out -- and that
*is* important to grow the utility of the *overall* data set.

To date, the only graphic I've seen which shows just how little
overall interconnectedness there is (was) in the LOD Cloud is my
own ... which someone has long since removed from display on the
EWC wiki page with the Cloud-like graphic, and which is certainly
well outdated, but which is still found here --

http://virtuoso.openlinksw.com/images/dbpedia-lod-cloud.html



As for the clustering, you can also have a look at the picture we
plotted with a network analysis tool: http://blog.larkc.eu/?p=1941

Our network file that we created out of the data on the ESW wiki
will also be outdated as soon as the new picture is completed.
I'll make a new version of it with the data on:
http://www4.wiwiss.fu-berlin.de/lodcloud/
With such a file, everyone will be able to plot its own cloud picture
and highlight the different structures he wants to see in it.


Christophe



--
Dr. Christophe Guéret (cgue...@few.vu.nl)
http://cgueret.net
Postdoc working on SOKS (http://www.few.vu.nl/soks)
Knowledge Representation  Reasoning Group
Computational Intelligence Group
Department of Computer Science, AI
VU University Amsterdam

attachment: cgueret.vcf

AW: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-03 Thread Chris Bizer
Hi Leigh,

In theory, the list is automatically updated with data from CKAN.

But as the CKAN server is overloaded today, the list is currently corrupted
and only shows a fraction of the datasets.

We hope that the issue is solved in the next hours!

Cheers,

Chris
 

 -Ursprüngliche Nachricht-
 Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im
 Auftrag von Leigh Dodds
 Gesendet: Freitag, 3. September 2010 16:10
 An: Anja Jentzsch
 Cc: public-lod@w3.org
 Betreff: Re: Next version of the LOD cloud diagram. Please provide input,
 so that your dataset is included.
 
 Hi,
 
  The list of datasets about which we have already collected information
  is be found here:
 
  http://www4.wiwiss.fu-berlin.de/lodcloud/
 
 Is that page manually maintained or is it derived from the data in CKAN?
 
 For example I've just added the missing data to my NASA dataset,
 including notes on how it links to dbpedia. This should ensure there's
 enough links to get it onto the diagram. However I'm not seeing the
 page update, so assume its manual.
 
 Just want to be clear on the process, i.e. will all CKAN updates
 automatically get rolled in?
 
 Cheers,
 
 L.
 --
 Leigh Dodds
 Programme Manager, Talis Platform
 Talis
 leigh.do...@talis.com
 http://www.talis.com




Re: AW: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-03 Thread Kingsley Idehen

 On 9/3/10 10:17 AM, Chris Bizer wrote:

Hi Leigh,

In theory, the list is automatically updated with data from CKAN.

But as the CKAN server is overloaded today, the list is currently corrupted
and only shows a fraction of the datasets.

We hope that the issue is solved in the next hours!

Cheers,

Chris


Chris,

Is CKAN now delivering RDF based descriptions of its particular data 
space (site or container) and the data items it hosts? I was hoping to 
see RDFa in their HTML  pages, at the very least.


Kingsley




-Ursprüngliche Nachricht-
Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im
Auftrag von Leigh Dodds
Gesendet: Freitag, 3. September 2010 16:10
An: Anja Jentzsch
Cc: public-lod@w3.org
Betreff: Re: Next version of the LOD cloud diagram. Please provide input,
so that your dataset is included.

Hi,


The list of datasets about which we have already collected information
is be found here:

http://www4.wiwiss.fu-berlin.de/lodcloud/

Is that page manually maintained or is it derived from the data in CKAN?

For example I've just added the missing data to my NASA dataset,
including notes on how it links to dbpedia. This should ensure there's
enough links to get it onto the diagram. However I'm not seeing the
page update, so assume its manual.

Just want to be clear on the process, i.e. will all CKAN updates
automatically get rolled in?

Cheers,

L.
--
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com






--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-03 Thread Anja Jentzsch

Hi Leigh,

On 03.09.2010 16:09, Leigh Dodds wrote:

Hi,


The list of datasets about which we have already collected information
is be found here:

http://www4.wiwiss.fu-berlin.de/lodcloud/


Is that page manually maintained or is it derived from the data in CKAN?


the page is updated automatically but that update process stopped 
temporarily because of some work on the CKAN API cache.



For example I've just added the missing data to my NASA dataset,
including notes on how it links to dbpedia. This should ensure there's
enough links to get it onto the diagram. However I'm not seeing the
page update, so assume its manual.


The datasets will only appear in the LOD cloud if they have enough 
information provided. We will manually assign them to the lodcloud group 
on CKAN after checking this.



Just want to be clear on the process, i.e. will all CKAN updates
automatically get rolled in?


They will.

Cheers,
Anja


Cheers,

L.





Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-03 Thread Leigh Dodds
Hi Chris, Anja

On 3 September 2010 15:17, Chris Bizer ch...@bizer.de wrote:
 In theory, the list is automatically updated with data from CKAN.

 But as the CKAN server is overloaded today, the list is currently corrupted
 and only shows a fraction of the datasets.

 We hope that the issue is solved in the next hours!

Thanks for the confirmation!

Cheers,

L.

--
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-03 Thread Jonathan Gray
FYI, we blogged this here:

  
http://blog.okfn.org/2010/09/03/next-version-of-the-linked-open-data-cloud-based-on-ckan/

All are, of course, most welcone to join ckan-discuss list if there
are any specific suggestions for features we should add:

  http://lists.okfn.org/mailman/listinfo/ckan-discuss

We will be continuing to develop CKAN's support for LOD/semantic web
technologies over the coming months (and years)! ;-)

On Fri, Sep 3, 2010 at 5:03 PM, Leigh Dodds leigh.do...@talis.com wrote:
 Hi Chris, Anja

 On 3 September 2010 15:17, Chris Bizer ch...@bizer.de wrote:
 In theory, the list is automatically updated with data from CKAN.

 But as the CKAN server is overloaded today, the list is currently corrupted
 and only shows a fraction of the datasets.

 We hope that the issue is solved in the next hours!

 Thanks for the confirmation!

 Cheers,

 L.

 --
 Leigh Dodds
 Programme Manager, Talis Platform
 Talis
 leigh.do...@talis.com
 http://www.talis.com





-- 
Jonathan Gray

Community Coordinator
The Open Knowledge Foundation
http://blog.okfn.org

http://twitter.com/jwyg
http://identi.ca/jwyg



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-03 Thread Alan Ruttenberg
I think you should consider having some better quality control and
standards around this, as I feel it is somewhat misleading. For
example (and this is one of several), consider CAS which is named in
the diagram. I don't consider the contents of that set to include any
data. Here is an example:

http://cu.bio2rdf.org/cas:921-60-8

Subject 
http://bio2rdf.org/cas:921-60-8

Predicate   Object
http://bio2rdf.org/bio2rdf_resource:url http://bio2rdf.org/html/cas:921-60-8
(Non-RDF URI)
http://www.w3.org/2002/07/owl#sameAshttp://cas.bio2rdf.org/cas:921-60-8
(External link)

This is content free.

In addition, the documentation of that set says it is not open:
http://ckan.net/package/bio2rdf-cas

Although this URI might be used to link somehow, in my opinion it is
misleading to call this collection a linked open *data* set. Further,
including it will do damage to LOD reputation if anyone actually looks
past that diagram to see what is really there.

Sincerely,

Alan Ruttenberg


On Fri, Sep 3, 2010 at 2:00 PM, Jonathan Gray jonathan.g...@okfn.org wrote:
 FYI, we blogged this here:

  http://blog.okfn.org/2010/09/03/next-version-of-the-linked-open-data-cloud-based-on-ckan/

 All are, of course, most welcone to join ckan-discuss list if there
 are any specific suggestions for features we should add:

  http://lists.okfn.org/mailman/listinfo/ckan-discuss

 We will be continuing to develop CKAN's support for LOD/semantic web
 technologies over the coming months (and years)! ;-)

 On Fri, Sep 3, 2010 at 5:03 PM, Leigh Dodds leigh.do...@talis.com wrote:
 Hi Chris, Anja

 On 3 September 2010 15:17, Chris Bizer ch...@bizer.de wrote:
 In theory, the list is automatically updated with data from CKAN.

 But as the CKAN server is overloaded today, the list is currently corrupted
 and only shows a fraction of the datasets.

 We hope that the issue is solved in the next hours!

 Thanks for the confirmation!

 Cheers,

 L.

 --
 Leigh Dodds
 Programme Manager, Talis Platform
 Talis
 leigh.do...@talis.com
 http://www.talis.com





 --
 Jonathan Gray

 Community Coordinator
 The Open Knowledge Foundation
 http://blog.okfn.org

 http://twitter.com/jwyg
 http://identi.ca/jwyg





Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-03 Thread Kingsley Idehen

 On 9/3/10 2:00 PM, Jonathan Gray wrote:

FYI, we blogged this here:

   
http://blog.okfn.org/2010/09/03/next-version-of-the-linked-open-data-cloud-based-on-ckan/

All are, of course, most welcone to join ckan-discuss list if there
are any specific suggestions for features we should add:

   http://lists.okfn.org/mailman/listinfo/ckan-discuss

We will be continuing to develop CKAN's support for LOD/semantic web
technologies over the coming months (and years)! ;-)


Jonathan,

What's the situation re. RDFa in your HTML Pages?

In short, what are the RDF options re. resource descriptors hosted in 
your data space at the current time?


Kingsley


On Fri, Sep 3, 2010 at 5:03 PM, Leigh Doddsleigh.do...@talis.com  wrote:

Hi Chris, Anja

On 3 September 2010 15:17, Chris Bizerch...@bizer.de  wrote:

In theory, the list is automatically updated with data from CKAN.

But as the CKAN server is overloaded today, the list is currently corrupted
and only shows a fraction of the datasets.

We hope that the issue is solved in the next hours!

Thanks for the confirmation!

Cheers,

L.

--
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com








--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-02 Thread Anja Jentzsch

Hi all,

we are in the process of drawing the next version of the LOD cloud 
diagram. This time it is likely to contain around 180 datasets 
altogether having a size of around 20 billion RDF triples.


For drawing the next version of the LOD cloud, we have started to 
collect meta-information about the datasets to be included on CKAN, a 
registry of open data and content packages provided by the Open 
Knowledge Foundation.


The list of datasets about which we have already collected information 
is be found here:


http://www4.wiwiss.fu-berlin.de/lodcloud/

In addition to basic meta-information about a dataset such as its size 
and the number of links pointing at other datasets, we also collect 
additional meta-information about the license of the dataset, 
alternative access options like SPARQL endpoints or dataset dumps, and 
whether there exist a voiD description of the dataset or a Semantic Web 
Sitemap.


So if your dataset is not listed yet and you want to have it included 
into the next version of the LOD cloud, please add it to CKAN until next 
Wednesday (September 8th, 2010).


Also, if we have collected wrong information about your dataset or if 
your dataset is only partially described up till now, it would be great 
if you could add the missing information.


Guidelines about how to add datasets to CKAN as well as about the tags 
that we are using to annotate the datasets are found here:

http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

We thank all contributors in advance for their input and help, which 
hopefully will allow us to draw the next version of the LOD cloud as 
accurate as possible.


Cheers,

Anja Jentzsch, Richard Cyganiak, Chris Bizer



Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-02 Thread Ted Thibodeau Jr

On Sep 2, 2010, at 02:10 PM, Anja Jentzsch wrote:

 Hi all,
 
 we are in the process of drawing the next version of the LOD cloud diagram. 
 This time it is likely to contain around 180 datasets altogether having a 
 size of around 20 billion RDF triples.



Cool!

But please ... this time, will there be any effort to make visible
the clustering within the LOD Cloud?  This seems to me one of the
best ways to encourage data set publishers to link out -- and that
*is* important to grow the utility of the *overall* data set.

To date, the only graphic I've seen which shows just how little
overall interconnectedness there is (was) in the LOD Cloud is my 
own ... which someone has long since removed from display on the 
EWC wiki page with the Cloud-like graphic, and which is certainly 
well outdated, but which is still found here --

   http://virtuoso.openlinksw.com/images/dbpedia-lod-cloud.html

Now, granted, mine didn't make the bubbles into a pretty cloud-like 
overall shape -- but it did reveal that most data sets (e.g., flickr
wrappr, Magnatune, Audioscrobbler) were only connecting to one or 
two others -- and I think that's important to see, just as it is 
to see which data sets several or many others connected to (e.g., 
DBpedia, Geonames, Musicbrainz), and which sets connected out to 
several or many others (e.g., Revyu, Linked MDB, the not-really-a-
data-set cloud of FOAF Profiles)...

Be seeing you,

Ted




--
A: Yes.  http://www.guckes.net/faq/attribution.html
| Q: Are you sure?
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?

Ted Thibodeau, Jr.   //   voice +1-781-273-0900 x32
Evangelism  Support //mailto:tthibod...@openlinksw.com
 //  http://twitter.com/TallTed
OpenLink Software, Inc.  //  http://www.openlinksw.com/
10 Burlington Mall Road, Suite 265, Burlington MA 01803
 http://www.openlinksw.com/weblogs/uda/
OpenLink Blogs  http://www.openlinksw.com/weblogs/virtuoso/
   http://www.openlinksw.com/blog/~kidehen/
Universal Data Access and Virtual Database Technology Providers