Hi Chris,
Thanks for answering and many thanks for giving the possibility of a
public debate (Copy to [email protected] with chris authorization.)
There are more consequences than only whoisi it's why I wanted it to
be public.
Le 24 juil. 2008 à 00:44, Christopher Blizzard a écrit :
Hi, Karl! I would love some of your thoughts on some of the things
that I mention below.
On Jul 22, 2008, at 9:35 PM, Karl Dubost wrote:
Hi Christophe,
I have noticed this morning that you were aggregating pieces of my
personae under the name Karl Dubost conflating different
personalities: Professional and Personal
http://whoisi.com/p/3168
ok. Note that people makes the connections because it is possible to
connect on a site. The devil is indeed out of the box. And many people
will do because they just can do. Or because they don't realize what
they are doing. On the other side, by setting up a system which
enables this, there is more responsibility.
Yeah, I've seen some people who have problems with that. I'm not
entirely sure what to do about that given that it's all user-driven
data, it's not aggregated by robots or programs. Part of the thing
about whoisi is it makes those disparate connections possible.
Opacity is the property for a medium to let light go through (more
exactly it is the mean distance that a photon has between two
interactions with the medium.)
Opacity on the network is greatly reduces because the time and the
space have been really compressed. It has benefits and big drawbacks
for the intimacy of people. When I'm walking in a city, I'm in a
public space. The local people who might see me and sometimes
recognize, might indeed propagate the information about me. But in
this action, they will give a partial rendering, they will forget
after a few days, it will take time for the information to travel
between individuals. Opacity maintains the social glue.
A system where everything you say, express is automatically rendered
identically (copy at different places), kept (search engines), and
transmitted quickly (internet) has strong consequences for the
individuals which are not all good.
When I speak in a cafe with friend, someone might hear me, but I don't
have to protect me. On the network, these days, I have to be careful,
and take big care of the level of access I give to my information. It
modifies deeply the way I have to deal with my casual information.
I'm really careful about this and I want opt-in systems not opt-out.
I have removed them for now. I'm pretty sure someone will add them
again. I wish not, but we will see.
But there is one thing which seems to be really bogus in your
system. One of the feeds you were aggregating is
http://www.la-grange.net/feed.rdf
I encourage you to see
http://www.la-grange.net/robots.txt
It is explicit
User-agent: *
Disallow: /
Please fix your RSS reader that it will enforce the robot protocol.
http://www.robotstxt.org/
This is an honest question: What is your expectation about how RSS
readers real with the robots.txt file?
Here I make a distinction between a human and a Web site. That creates
a big difference. A person who is reading my Web site through a RSS
reader has made a decision to do so. My content being aggregated by an
engine which is not under the direct control of someone makes it a no
no. It is becoming a bot. Exactly the same way I make a difference
between a browser (individual control and choice) and between a search
engine bot (anonymously collective).
My expectation is it depends on the type of reader, how it
redistributes the content, to who, etc. You can't say to the user
agents on the network for now on how the content you have created
should be reproduced.
For example, google reader happily adds your site as an rss feed
and clearly has been aggregating data on it for a while. (It has
history much longer than what's in your RSS feed, for example.)
Hmm I'll have to check because I thought I blocked them. The reason
was that I had nothing against the RSS reader of Google itself, but
the fact that despite of my robots.txt, RSS Reader was feeding Google
Search database with the titles and the links bypassing the robots.txt.
whoisi is essentially a big shared rss reader. Do you think that
the rules for whoisi should be different than for something like
google reader?
Human versus machine. Yes.
The opening page for robotstxt.org contains this phrase:
"Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are
programs that traverse the Web automatically. Search engines such as
Google use them to index the web content, spammers use them to scan
for email addresses, and they have many other uses."
*automatically* and not the individual choice of someone.
whoisi does no wandering, has no crawlers or spiders. Everything
that is done is driven by user interaction. It's driven by humans,
not robots. :)
Yes. Basically you are demonstrating the effect of mobs. Flash mob
could be used for fun, for the benefits of a "good" projects or for
tracking down people with nasty effects. All individual people
thinking that they don't do harm.
I'd love to have a way to mark things as "don't aggregate this RSS
with other entries" but robots.txt doesn't seem like quite the right
tool. It's very brute force and given the robots.txt that are on
sites like twitter.com, where I do pull a lot of data, it would keep
whoisi from pulling information from them. It doesn't seem like the
right tool for that kind of job. It's aimed at spiders, not rss
readers.
Maybe it could be something in the different RSS feed, an element in
Atom, RSS 2.0 and RSS 1.0 which informs that automatic aggregation is
not accepted. That's an interesting topic. I guess I will discuss it
with participants at iCommons Summit in Hokkaido, next week.
Though I have issues with a specific statement on your content, be RSS
feed and HTML, etc.
By making a statement against some aggregation type (asking more
opacity), you are making yourself more visible (Recently the Boring
couple and their house in GoogleMaps street view). you could say for
example, do not aggregate my content based on geographical filtering.
Or do not aggregate if you intend to do commercial uses of it.
I'm asking people for suggestions on what might work in terms of how
to avoid aggregating those kinds of things, to try and protect and
enhance privacy where I can, but the tools aren't quite there. What
do you suggest?
My personal opinion for aggregation, indexing, etc. is to give the
power back to people. Every aggregations should be opt-in and not opt-
out. opt-out systems are far too complex for most of the people.
Not many people can add to their Web site, this kind of information in
a .htaccess
SetEnvIfNoCase User-Agent ".*Technorati*." bad_bot
SetEnvIfNoCase User-Agent "Microsoft Office" bad_bot
SetEnvIfNoCase User-Agent ".*QihooBot*." bad_bot
SetEnvIfNoCase User-Agent ".*CazoodleBot*." bad_bot
SetEnvIfNoCase User-Agent ".*Acoon-Robot*." bad_bot
SetEnvIfNoCase User-Agent ".*Gigamega*." bad_bot
SetEnvIfNoCase User-Agent ".*MJ12bot*." bad_bot
SetEnvIfNoCase User-Agent ".*yacybot*." bad_bot
SetEnvIfNoCase User-Agent ".*Moreoverbot*." bad_bot
SetEnvIfNoCase User-Agent ".*Tailrank*." bad_bot
SetEnvIfNoCase User-Agent ".*WikioFeedBot*." bad_bot
SetEnvIfNoCase User-Agent ".*NIF/1.1*." bad_bot
SetEnvIfNoCase User-Agent ".*SnapPreviewBot*." bad_bot
SetEnvIfNoCase User-Agent ".*Feedfetcher-Google*." bad_bot
SetEnvIfNoCase User-Agent ".*SPIP-1.8.2*." bad_bot
SetEnvIfNoCase User-Agent ".*whoisi*." bad_bot
Order Allow,Deny
Allow from all
Deny from env=bad_bot
Thanks for starting the discussion.
Other references for this discussion
Mitchell Baker has recently published "Why focus on data?"
http://blog.lizardwrangler.com/2008/07/22/why-focus-on-data/
There is also the text from Daniel Weitzner "Reciprocal Privacy (ReP)
for the Social Web"
http://dig.csail.mit.edu/2007/12/rep.html
--
Karl Dubost - W3C
http://www.w3.org/QA/
Be Strict To Be Cool