On 6/22/11 11:26 PM, Henry Story wrote:
On 23 Jun 2011, at 00:11, Alexandre Passant wrote:
On 22 Jun 2011, at 22:49, Richard Cyganiak wrote:
On 21 Jun 2011, at 10:44, Martin Hepp wrote:
PS: I will not release the IP ranges from which the trouble originated, but
rest assured, there were top research institutions among them.
The right answer is: name and shame. That is the way to teach them.
You may have find the right word: teach.
We've (as academic) given tutorials on how to publish and consume LOD, lots of
things about best practices for publishing, but not much about consuming.
Why not simply coming with reasonable guidelines for this, that should also be
taught in institutes / universities where people use LOD, and in tutorials
given in various conferences.
That is of course a good idea. But longer term you don't want to teach that
way. It's too time consuming. You need the machines to do the teaching.
Think about Facebook. How did 500 million people go to use it? Because they
were introduced by friends, by using it, but not by doing tutorials and going
to courses. The system itself teaches people how to use it.
So the same way, if you want to teach people linked data, get the social web
going and they will learn the rest by themselves. If you want to teach crawlers
to behave, make bad behaviour uninteresting. Create a game and rules where good
behaviour are rewarded and bad behaviour has the opposite effect.
This is why I think using WebID can help. You can use the information to build
lists and rankings of good and bad crawlers, people with good crawlers get to
present papers and crawling confs, bad crawlers get throttled out of crawling.
Make it so that the system can grow beyond academic and teaching settings, into
the world of billions of users spread across the world, living in different
political institutions and speaking different languages. We have had good
crawling practices since the beginning of the web, but you need to make them
evident and self teaching.
EG. A crawler that crawls to much will get slowed down, and redirected to pages
on crawling behavior, written and translated into every single language on the
planet.
+1000
That's the game in a nutshell!
We have to keep virtuous cycles at the core of the increasingly social Web.
Kingsley
Henry
m2c
Alex.
Like Karl said, we should collect information about abusive crawlers so that
site operators can defend themselves. It won't be *that* hard to research and
collect the IP ranges of offending universities.
I started a list here:
http://www.w3.org/wiki/Bad_Crawlers
The list is currently empty. I hope it stays that way.
Thank you all,
Richard
--
Dr. Alexandre Passant,
Social Software Unit Leader
Digital Enterprise Research Institute,
National University of Ireland, Galway
Social Web Architect
http://bblfish.net/
--
Regards,
Kingsley Idehen
President& CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen