On 23 Jun 2011, at 13:13, Michael Brunnbauer wrote:
>
> On Thu, Jun 23, 2011 at 11:32:43AM +0100, Kingsley Idehen wrote:
>>> config = {
>>> 'Googlebot':['googlebot.com'],
>>> 'Mediapartners-Google':['googlebot.com'],
>>> 'msnbot':['live.com','msn.com','bing.com'],
>>> 'bingbot':['live.com','msn.com','bing.com'],
>>> 'Yahoo! Slurp':['yahoo.com','yahoo.net']
>>> }
>> How does that deal with a DoS query inadvertently or deliberately
>> generated by a SPARQL user agent?
>
> It's part of the solution. It prevents countermeasures hitting the crawlers
> that are welcome.
>
> How does WebID deal with it - except that it allows more fine grained ACLs per
> person/agent instead of DNS domain ? WebID is a cool thing and maybe crawlers
> will use it in the future but Martin needs solutions right now.
I'd emphase the above: it allows *Much* more fine grained ACLs. It's the
difference between a police that would throw all gypsies into jail because it
had some information leading them to think one gypsy stole something, and a
police that would find the guilty person and just put him to jail.
Not only does it allow finer grained ACLs but it would allow agents to identify
themselves: say as crawlers or end users. A Crawler could quickly be guided to
the relevant dump file or RSS feeds, so that he does not need to waste
resources on the server. It then allows the user/crawler to tie into linked
data, which then means that we are applying recursively linked data to solve a
linked data problem. That's the neat bit :-)
Henry
Social Web Architect
http://bblfish.net/