On 8/18/14 12:35 PM, Michael Brunnbauer wrote:
Hello Chris,

On Mon, Aug 18, 2014 at 12:05:54PM +0200, Christian Bizer wrote:
>Sorry, we are not Google and simply did not have the resources to crawl the 
whole Web and as for RDF/XML when dereferencing each URL.
Seehttp://www.sengine.info/

We try to crawl 1000 URLs from every site that has less than 5000 other sites
on the same IP.

>Alternatively, one could of course search for HTML documents that contain links pointing at RDF/Linked 
Data documents (for instance using <link rel="alternate" type="application/rdf+xml" 
...> in the header part of an HTML document).
[...]
>It would be great if somebody would investigate this deeper and produce a list 
with Linked Data URIs that could be used as seeds for further crawls.
mysql> select rel,count(*) as number from link_rdf group by rel order by number 
desc limit 10;
+--------------------+---------+
| rel                | number  |
+--------------------+---------+
| meta               | 4348067 |
| alternate          | 2080611 |
| alternate meta     |   61169 |
| meta FOAF.MAKER    |   38293 |
| ExportRDF          |   31366 |
| alternate nofollow |   19176 |
| media              |   11568 |
| Skos metadata      |    9364 |
| resourcemap        |    8727 |
|                    |    1708 |
+--------------------+---------+
10 rows in set (28.36 sec)

select title,count(*) as number from link_rdf group by title order by number 
desc limit 50;
+-------------------------------------------------+---------+
| title                                           | number  |
+-------------------------------------------------+---------+
| Creative Commons                                | 2074663 |
| FOAF                                            | 1805441 |
| RDF+XML                                         |  328872 |
| RSS 1.0                                         |  155568 |
| ICRA labels                                     |  152142 |
| RDF                                             |  151377 |
|                                                 |  144916 |
| Calais RDF                                      |   65550 |
| SIOC                                            |   50826 |
| RDF 1.0                                         |   48448 |
| Dublin Core                                     |   48047 |
| RDF 1.1                                         |   40227 |
| Meta Information                                |   25181 |
| RDF Version                                     |   24449 |
| RDF Version of this post                        |   24356 |
| Items in Collection                             |   23903 |
| RDF Representation                              |   22327 |
| This category listings in RDF                   |   19176 |
| Dublin                                          |   17796 |
| Structured Descriptor Document (RDF/XML format) |   14648 |
| RDF Metadata                                    |   11275 |
| Skos Core                                       |    9364 |
| Structured Description in RDF/XML format        |    8666 |
| Items in Community                              |    7358 |
| notice                                          |    5998 |
| RDF/XML version of this document                |    5677 |
| RDF/XML                                         |    5325 |
| RDF/XML Version                                 |    4531 |
| Get RDF 1.0 Feed                                |    4426 |
| RDF/XML data for this webshop                   |    4063 |
| RDF+XML (VOA3R)                                 |    3887 |
| LG RDF                                          |    3547 |
| Metadata                                        |    3047 |
| Packages involving this user                    |    2914 |
| Product RDF/XML data                            |    2905 |
| DOAP                                            |    2640 |
| Geo                                             |    2593 |
| XML                                             |    2409 |
| Main Page                                       |    1735 |
| This page in RDF (XML)                          |    1573 |
| Public Stream Feed (RSS 1.0)                    |    1541 |
| RDF Description                                 |    1506 |
| Get RDF                                         |    1161 |
| RDF Version of this categorie                   |    1124 |
| unprocessed RDF+XML metadata                    |     993 |
| Dane produktu w formacie RDF/XML                |     990 |
| Supplier RDF/XML data                           |     962 |
| Essay metadata                                  |     758 |
| Dublin Core Metadata                            |     737 |
| rdf:foaf                                        |     730 |
+-------------------------------------------------+---------+
50 rows in set (2 min 22.54 sec)

Contact me if you are interested.

Do you not have this data in RDF form? Ideally, you should publish this data in a form that's accessible via HTTP lookups (and SPARQL queries. I am sure you can see the irony in the SQL query results presented above :-)

--
Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to