Re: Think before you write Semantic Web crawlers

Sebastian Schaffert Thu, 23 Jun 2011 04:33:24 -0700

Martin,

Am 23.06.2011 um 10:30 schrieb Martin Hepp:


> Sebastian, all:
> The community may not publicly admit it, but: SW and LOD have been BEGGING 
> for adoption for almost a decade. Now, if someone outside of a University 
> project publishes valuable RDF data in a well-above-the-standards way, you 
> make him pay several hundred Euros for traffic just for your ISWC paper.

I am very well aware of the problem of adoption. At the same time, we have a 
similar problem not only in the publication of the data but also in the 
consumption: if we do not let users consume our data even in large scale, what 
use is the data at all? I agree that bombarding a server with crawlers just for 
harvesting as many triples as possible without thinking about their use is 
stupid. But it will always happen, no matter how many mails we have on the 
Linked Data mailinglist.

My argument is that even the useful applications that will build on top of 
Linked Data will eventually make the data providers pay real money for 
publishing their data. In the same way that it costs to publish a website in 
the eyeball Web. Now what is the difference? Probably that people nowadays 
immediately see that the money on a website is well spent, while they do not 
see how the money invested on Linked Data is well spent. Because of lack of 
compelling applications and data use. And now we have a circle: how are people 
going to implement compelling applications if they have no access to the data?

Btw, just for the record: for my ISWC paper I did not harvest the Web for RDF; 
instead we wrote a Linked Data server that eventually might contribute to the 
scalability problems and make it both easier and cheaper to publish Linked Data.

> 
> Quote from one e-mail I received: "I think we are among the Universities 
> running such a crawler. Because we were in a rush for the ISWC deadline, 
> nobody took the time to implement robots.txt and bandwidth throttling. Sorry."
> 
> Stop dreaming.

So stop doing research? ;-)

I am dreaming of providing to users the technology that allows them to publish 
their data as Linked Data easily without needing to care too much about the 
complex issues that come with Linked Data. Like scalability, like 
authentication, and like technical issues like bandwidth throttling (which can 
be equally implemented on the server).

> A technical improvement for the WWW cannot be developed in isolation from the 
> socio-economic environment. I.e., it will lead to nowhere to just work on 
> technical solutions that don't fit the characteristics of the target 
> eco-system, skill-wise, incentive-wise, or complexity-wise, and then wait for 
> the world to pick it up. Unless you want your work to be listed here
> 
>    http://www.mclol.com/funny-articles/most-useless-inventions-ever/
> 
> WebID is a notable exception, because it takes into account exactly those 
> dimensions.
> 
>> And what if in the future 100.000 software agents will access servers? We 
>> will have the scalability issue eventually even without crawlers, so let's 
>> try to solve it. In the eyeball web, there are also crawlers without too 
>> much of a problem, and if Linked Data is to be successful we need to do the 
>> same.
> 
> How do you personally solve the scalability issue for small site-owners who 
> are running a decent service from a basic understanding of HTML, PHP, and 
> MySQL?


By providing a technology like the Apache Webserver (just for Linked Data) that 
you did not even mention because it is so obvious. They simply should not need 
to care at all, because we provide them with the right technology that takes 
away the current problems. We are working on that in Salzburg, and many others 
are working on that.


Greetings,

Sebastian
-- 
| Dr. Sebastian Schaffert          [email protected]
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg

Re: Think before you write Semantic Web crawlers

Reply via email to