Re: Think before you write Semantic Web crawlers

Martin Hepp Thu, 23 Jun 2011 01:31:19 -0700

Sebastian, all:
The community may not publicly admit it, but: SW and LOD have been BEGGING for 
adoption for almost a decade. Now, if someone outside of a University project 
publishes valuable RDF data in a well-above-the-standards way, you make him pay 
several hundred Euros for traffic just for your ISWC paper.

Quote from one e-mail I received: "I think we are among the Universities 
running such a crawler. Because we were in a rush for the ISWC deadline, nobody 
took the time to implement robots.txt and bandwidth throttling. Sorry."

Stop dreaming. A technical improvement for the WWW cannot be developed in 
isolation from the socio-economic environment. I.e., it will lead to nowhere to 
just work on technical solutions that don't fit the characteristics of the 
target eco-system, skill-wise, incentive-wise, or complexity-wise, and then 
wait for the world to pick it up. Unless you want your work to be listed here

    http://www.mclol.com/funny-articles/most-useless-inventions-ever/

WebID is a notable exception, because it takes into account exactly those 
dimensions.

> And what if in the future 100.000 software agents will access servers? We 
> will have the scalability issue eventually even without crawlers, so let's 
> try to solve it. In the eyeball web, there are also crawlers without too much 
> of a problem, and if Linked Data is to be successful we need to do the same.

How do you personally solve the scalability issue for small site-owners who are 
running a decent service from a basic understanding of HTML, PHP, and MySQL?

Best
Martin

On Jun 23, 2011, at 1:08 AM, Sebastian Schaffert wrote:

> 
> Am 22.06.2011 um 23:01 schrieb Lin Clark:
> 
>> On Wed, Jun 22, 2011 at 9:33 PM, Sebastian Schaffert 
>> <[email protected]> wrote:
>> 
>> Your complaint sounds to me a bit like "help, too many clients access my 
>> data".
>> 
>> I'm sure that Martin is really tired of saying this, so I will reiterate for 
>> him: It wasn't his data, they weren't his servers. He's speaking on behalf 
>> of people who aren't part of our insular community... people who don't have 
>> a compelling reason to subsidize a PhD student's Best Paper award with their 
>> own dollars and bandwidth.
> 
> And what about those companies subsidizing PhD students who write crawlers 
> for the normal Web? Like Larry Page in 1998?
> 
>> 
>> Agents can use Linked Data just fine without firing 150 requests per second 
>> at a server. There are TONs of use cases that do not require that kind of 
>> server load.
> 
> And what if in the future 100.000 software agents will access servers? We 
> will have the scalability issue eventually even without crawlers, so let's 
> try to solve it. In the eyeball web, there are also crawlers without too much 
> of a problem, and if Linked Data is to be successful we need to do the same.
> 
> Greetings,
> 
> Sebastian
> -- 
> | Dr. Sebastian Schaffert          [email protected]
> | Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
> | Head of Knowledge and Media Technologies Group          +43 662 2288 423
> | Jakob-Haringer Strasse 5/II
> | A-5020 Salzburg
>

Re: Think before you write Semantic Web crawlers

Reply via email to