Re: [ANN] Introducing Apache Agora - reloaded!

2005-07-14 Thread Will Glass-Husain

Replying to the community list as requested...

Neat app!  Not immediately intuitive as to how to interpret it, but with a
little experimentation I could see patterns.  For example, it was
interesting to notice how my email moved from the outskirts of the circle
with data from early months to the center of the circle in later months (for
the projects I'm involved with).

I'm still unclear on what to look for in terms of community health.  What
are some of the general macro patterns you've seen with this tool?  What
insight does this provide into the community?  The docs provide a good micro
level description of how the app models the relationships between
individuals, but don't discuss the macro patterns that emerge.   It'd be
interesting to hear some of your thoughts.

Best,
WILL


- Original Message - 
From: Stefano Mazzocchi [EMAIL PROTECTED]

To: Apache Committers [EMAIL PROTECTED]
Sent: Thursday, July 14, 2005 12:14 PM
Subject: [ANN] Introducing Apache Agora - reloaded!



NOTE: please excuse the noise if you are not interested, but there is no
easier way to reach all of you and I thought many of you might be
interested in this.

hat type=director mode=off

A few years ago, around the time the incubator started to appear as the
escape valve for the growth problems that some projects were exhibiting,
I started to wonder if there could be a way, for those mentoring and
providing oversight for particular projects, to make their job easier,
especially if they were not participating in the day-to-day work of the
various communities they were helping grow strong and self-sufficient.

The task is very difficult, not only due to the nature of the problem
(and the unstructuredness of the data), but also about the fact that you
don't want to create more problems that you are solving: for example,
you won't want people to feel spied or abused by numerical rating and
rankings.

The result of that thinking was Apache Agora, a system that I designed
and implemented 3 years ago and that has been running (quite silently)
on Nagoya since then.

Since Nagoya is going away, I moved Agora over to minotaur and I have
aligned it with the existing mail archive (the same one that we use to
power our official mod_mbox based archives). Find it at

+---+
|   |
|  http://people.apache.org/~stefano/agora/ |
|   |
+---+


what is this?
-

Agora is a community visualizer. If you wonder who is the core of a
particular community (for example, to know who to ask for something) or
how big/active/diverse/balanced a community is, Agora is for you.


how does it work?
-

Agora is composed of two pieces:

1) a python scripts that reads mbox files and generates 'precooked' data

2) a java applet that reads the precooked data and visualizes it

the script is running every week (on sundays) on minotaur and it's fully
incremental, meaning that knows where it lefts off the week before.

how about the network?
--

The network is created by harvesting the email addresses and linking
them depending on the fact that one address replied to a message sent by
another address.

I say address because an address is not a person, as there might be
several addresses belonging to the same person (and no, the system
doesn't (yet) allow different addresses that belong to the same person
to be smooshed together)

In order to reduce noise, the network is the pruned. All addresses that
only received or sent email are removed from the graph. So, the
resulting graph is a smaller version of those nodes that exhibit minimal
connectivity characteristics (and helps to remove, for example, agents
like bugzilla or SVN or spam, that never reply, or lurkers that don't
participate in discussions).


how do I start using it?


The tree on the left lists all the 'precooked data' that agora is able
to understand. This is a mirror of the list of the folders in
/home/apmail/public-arch on minotaur.apache.org and will be
automatically updated when new mail lists will be added (so infra@ nor I
have to do anything! you can always count on my lazy ass ;-)

In order to see anything, you have to click on one of the files on the
tree, wait for a few seconds (until the file icon turns reddish) and
then click on the load button. This will load the data, create the
network, perform the pruning and show it in the graph pane.


cool, I have a graph, now what?
---

Click the start button and the graph will clusterize. If you merged
data from different mailing lists, you will see them forming different
groups.

If you click on a node, it will show the address related to that node.

if you right-click anywhere, a fisheye zoom 

Re: [ANN] Introducing Apache Agora - reloaded!

2005-07-14 Thread Ian Holsman
How would you compare it against Microsoft's Netscan 
(http://netscan.research.microsoft.com/Static/Default.asp)

?
which also tries to find the main contributors in different communities.

Is 'agora' public knowledge?

what does the 'decay' area do?

How does one differentiate between a useful communication and a flame 
war? I remember seeing Mark Smith (the netscan developer) talk about how 
he could identify the different types via the length of the conversation.


Overall a big '+1'

Will Glass-Husain wrote:

Replying to the community list as requested...

Neat app!  Not immediately intuitive as to how to interpret it, but with a
little experimentation I could see patterns.  For example, it was
interesting to notice how my email moved from the outskirts of the circle
with data from early months to the center of the circle in later months 
(for

the projects I'm involved with).

I'm still unclear on what to look for in terms of community health.  What
are some of the general macro patterns you've seen with this tool?  What
insight does this provide into the community?  The docs provide a good 
micro

level description of how the app models the relationships between
individuals, but don't discuss the macro patterns that emerge.   It'd be
interesting to hear some of your thoughts.

Best,
WILL


- Original Message - From: Stefano Mazzocchi [EMAIL PROTECTED]
To: Apache Committers [EMAIL PROTECTED]
Sent: Thursday, July 14, 2005 12:14 PM
Subject: [ANN] Introducing Apache Agora - reloaded!



NOTE: please excuse the noise if you are not interested, but there is no
easier way to reach all of you and I thought many of you might be
interested in this.

hat type=director mode=off

A few years ago, around the time the incubator started to appear as the
escape valve for the growth problems that some projects were exhibiting,
I started to wonder if there could be a way, for those mentoring and
providing oversight for particular projects, to make their job easier,
especially if they were not participating in the day-to-day work of the
various communities they were helping grow strong and self-sufficient.

The task is very difficult, not only due to the nature of the problem
(and the unstructuredness of the data), but also about the fact that you
don't want to create more problems that you are solving: for example,
you won't want people to feel spied or abused by numerical rating and
rankings.

The result of that thinking was Apache Agora, a system that I designed
and implemented 3 years ago and that has been running (quite silently)
on Nagoya since then.

Since Nagoya is going away, I moved Agora over to minotaur and I have
aligned it with the existing mail archive (the same one that we use to
power our official mod_mbox based archives). Find it at

+---+
|   |
|  http://people.apache.org/~stefano/agora/ |
|   |
+---+


what is this?
-

Agora is a community visualizer. If you wonder who is the core of a
particular community (for example, to know who to ask for something) or
how big/active/diverse/balanced a community is, Agora is for you.


how does it work?
-

Agora is composed of two pieces:

1) a python scripts that reads mbox files and generates 'precooked' data

2) a java applet that reads the precooked data and visualizes it

the script is running every week (on sundays) on minotaur and it's fully
incremental, meaning that knows where it lefts off the week before.

how about the network?
--

The network is created by harvesting the email addresses and linking
them depending on the fact that one address replied to a message sent by
another address.

I say address because an address is not a person, as there might be
several addresses belonging to the same person (and no, the system
doesn't (yet) allow different addresses that belong to the same person
to be smooshed together)

In order to reduce noise, the network is the pruned. All addresses that
only received or sent email are removed from the graph. So, the
resulting graph is a smaller version of those nodes that exhibit minimal
connectivity characteristics (and helps to remove, for example, agents
like bugzilla or SVN or spam, that never reply, or lurkers that don't
participate in discussions).


how do I start using it?


The tree on the left lists all the 'precooked data' that agora is able
to understand. This is a mirror of the list of the folders in
/home/apmail/public-arch on minotaur.apache.org and will be
automatically updated when new mail lists will be added (so infra@ nor I
have to do anything! you can always count on my lazy ass ;-)

In order to see anything, you have to click on one of the files on the
tree, wait 

Re: [ANN] Introducing Apache Agora - reloaded!

2005-07-14 Thread Stefano Mazzocchi
Will Glass-Husain wrote:
 Replying to the community list as requested...

Thank you.

 Neat app!  Not immediately intuitive as to how to interpret it, but with a
 little experimentation I could see patterns.  For example, it was
 interesting to notice how my email moved from the outskirts of the circle
 with data from early months to the center of the circle in later months
 (for
 the projects I'm involved with).
 
 I'm still unclear on what to look for in terms of community health.  

eheh, I'm not sure either :-)

 What are some of the general macro patterns you've seen with this tool?

First of all, the 'size' of the pruned graph is generally a good sign
because it means there is less chance of a few key players moving out of
the project and leaving the social network disconnected.

Another interesting thing is that the people at the center are actually
the people I expect to be there. In projects that I follow, I was hardly
ever surprised: the distance of their node from the 'center of social
gravity' of the community was always (and I mean *always*) reasonable.

I don't know about the projects that I don't follow, but I've never
heard anybody complain.

I also found out to be very effective in understanding how much
traction/influence a person might have in a community by dragging his
node. Sometimes, if more people are involved in a discussion, I pull
their nodes apart and see where the center of gravity shifts. Normally
the result of the discussion tends to settle toward the person that
moved more the graph.

This is amazing, because agora does *NOT* even try to understand what
the messages say, but only that the message did happen.

I suspect there is a deep reason for the apparent incredible signal: in
well behaving communities, people do not reply if they don't have
anything to say.

I suspect Agora would fail miserably to be as effective in disfunctional
communities where people keep emailing eachother with flamewars.
Luckily, this is rarely the case in the foundation.

 What insight does this provide into the community?  The docs provide a good
 micro
 level description of how the app models the relationships between
 individuals, but don't discuss the macro patterns that emerge.   It'd be
 interesting to hear some of your thoughts.

I wrote this years ago, as an experiment. Then I started to use it more
and more as a 'telescope' to look at communities that I didn't know, to
understand who were the key players in that communities or, if I heard
something worrysome about somebody, whether or not to worry that it
could have a big impact on a particular community.

Unfortunately, this came before the incubator was setup, so the mail
archive on nagoya, who was based on eyebrowse, was kinda left alone and
a lot of the mailing lists were not there. Some people from the
incubator wanted to evaluate the growth of the project with Agora, but
they couldn't.

There seems to be a lot of information in there. I have my own way of
using it but I don't know if it's a general rule and I don't want people
to think that their project is better than another just because their
graph is bigger or more densly connected.

But it is fascinating to compare different mailing lists, especially
over time. For example, whether or not 'dev' is more or less densily
connected than 'users'.

And it's also very useful to understand the 'bridges', the people that
write email in more than one mailing list, those are very important
people for the ASF, as they bring crosspollination and allow information
to flow thru the various islands (and improves our ability to
evolutionarely adapt to change in the technical and social ecosystem).

It's a social telescope. And normally it's a lot of fun to use
telescopes, even if you don't understand everything about the why the
stars and galaxies are they way they are. I feel the same way about
Agora: you don't have to have a model of what is happening absolutely,
as long as you can spot differences between various projects.

But I don't know the metric for community health and I don't think such
a thing even exists, so if that's what you are looking for, you are not
going to get it from Agora (nor anything I do).

-- 
Stefano.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [ANN] Introducing Apache Agora - reloaded!

2005-07-14 Thread Stefano Mazzocchi
Ian Holsman wrote:
 How would you compare it against Microsoft's Netscan
 (http://netscan.research.microsoft.com/Static/Default.asp)
 ?
 which also tries to find the main contributors in different communities.

I think main implies metrics and I really didn't want to go there. I
think contribution is inversily proportional to the distance from the
center of gravity of the group, but I wanted to keep it subjective to
avoid building altars than that people want to fight to step on.

 Is 'agora' public knowledge?
 
 what does the 'decay' area do?
 
 How does one differentiate between a useful communication and a flame
 war? I remember seeing Mark Smith (the netscan developer) talk about how
 he could identify the different types via the length of the conversation.
 
 Overall a big '+1'
 
 Will Glass-Husain wrote:
 
 Replying to the community list as requested...

 Neat app!  Not immediately intuitive as to how to interpret it, but
 with a
 little experimentation I could see patterns.  For example, it was
 interesting to notice how my email moved from the outskirts of the circle
 with data from early months to the center of the circle in later
 months (for
 the projects I'm involved with).

 I'm still unclear on what to look for in terms of community health. 
 What
 are some of the general macro patterns you've seen with this tool?  What
 insight does this provide into the community?  The docs provide a good
 micro
 level description of how the app models the relationships between
 individuals, but don't discuss the macro patterns that emerge.   It'd be
 interesting to hear some of your thoughts.

 Best,
 WILL


 - Original Message - From: Stefano Mazzocchi
 [EMAIL PROTECTED]
 To: Apache Committers [EMAIL PROTECTED]
 Sent: Thursday, July 14, 2005 12:14 PM
 Subject: [ANN] Introducing Apache Agora - reloaded!


 NOTE: please excuse the noise if you are not interested, but there is no
 easier way to reach all of you and I thought many of you might be
 interested in this.

 hat type=director mode=off

 A few years ago, around the time the incubator started to appear as the
 escape valve for the growth problems that some projects were exhibiting,
 I started to wonder if there could be a way, for those mentoring and
 providing oversight for particular projects, to make their job easier,
 especially if they were not participating in the day-to-day work of the
 various communities they were helping grow strong and self-sufficient.

 The task is very difficult, not only due to the nature of the problem
 (and the unstructuredness of the data), but also about the fact that you
 don't want to create more problems that you are solving: for example,
 you won't want people to feel spied or abused by numerical rating and
 rankings.

 The result of that thinking was Apache Agora, a system that I designed
 and implemented 3 years ago and that has been running (quite silently)
 on Nagoya since then.

 Since Nagoya is going away, I moved Agora over to minotaur and I have
 aligned it with the existing mail archive (the same one that we use to
 power our official mod_mbox based archives). Find it at

 +---+
 |   |
 |  http://people.apache.org/~stefano/agora/ |
 |   |
 +---+


 what is this?
 -

 Agora is a community visualizer. If you wonder who is the core of a
 particular community (for example, to know who to ask for something) or
 how big/active/diverse/balanced a community is, Agora is for you.


 how does it work?
 -

 Agora is composed of two pieces:

 1) a python scripts that reads mbox files and generates 'precooked' data

 2) a java applet that reads the precooked data and visualizes it

 the script is running every week (on sundays) on minotaur and it's fully
 incremental, meaning that knows where it lefts off the week before.

 how about the network?
 --

 The network is created by harvesting the email addresses and linking
 them depending on the fact that one address replied to a message sent by
 another address.

 I say address because an address is not a person, as there might be
 several addresses belonging to the same person (and no, the system
 doesn't (yet) allow different addresses that belong to the same person
 to be smooshed together)

 In order to reduce noise, the network is the pruned. All addresses that
 only received or sent email are removed from the graph. So, the
 resulting graph is a smaller version of those nodes that exhibit minimal
 connectivity characteristics (and helps to remove, for example, agents
 like bugzilla or SVN or spam, that never reply, or lurkers that don't
 participate in discussions).


 how do I start using it?
 

 The tree on the left lists all 

Re: [ANN] Introducing Apache Agora - reloaded!

2005-07-14 Thread Stefano Mazzocchi
Stefano Mazzocchi wrote:
 Ian Holsman wrote:
 
How would you compare it against Microsoft's Netscan
(http://netscan.research.microsoft.com/Static/Default.asp)
?
which also tries to find the main contributors in different communities.
 
 
 I think main implies metrics and I really didn't want to go there. I
 think contribution is inversily proportional to the distance from the
 center of gravity of the group, but I wanted to keep it subjective to
 avoid building altars than that people want to fight to step on.
 

sorry, hit sent too soon.

Is 'agora' public knowledge?

no 'private' mail list is being analyzed, so yes, it's public knowledge.

it has not been largerly publicized (yet) but I wouldn't be against
putting it in a more visible position on the apache.org web site.

what does the 'decay' area do?

if you do one reply to a message of mine, agora creates a link between
you and me of strenght 1.0, then if you do another reply this gets
added. Note that links are directional: you might reply a lot to me, but
I never reply to you, this is still calculated in the graph drawing
algorithm.

Decay means that you get 1.0 if you reply now and exponentially lower
value if your reply was earlier in time.

I introduced this because I was curious about how much the past of a
project (especially if you load a lot of months of a project in memory)
was influencing its present.

Rather surprisingly, decay does *NOT* introduce substantial difference
in the way the graph is shaped or the position of people in the graph,
which is a very very interesting property and I have no idea why that is
the case.

How does one differentiate between a useful communication and a flame
war? 

There is no attempt to do, ATM.

I remember seeing Mark Smith (the netscan developer) talk about how
he could identify the different types via the length of the conversation.

As I mentioned earlier, we don't tend to host a lot of inflammatory
people in Apache (don't really know why, I suspect is an historical
thing or avoiding to react agressively to aggressions, which make
flamelovers go somewhere else, but I don't know how to test this
hypothesis), this keeps the signal/noise ratio high.

Identifying a conversation means that at least *you* can pretend to
understand the difference between inflammatory and not. I suspect this
difference is also very cultural: a conversation that is a 'normal' tone
in one community might be considered very 'strident' in another. I'm
sure I'm not the only one who has experienced this.

At the end of the day, I'm a big fan of the love/hate hypothesis:
replying to somebody indicates a sort of preferential attachment, no
matter what you are saying. Ignoring them is the only signal that the
communication is not useful.

NOTE: I do *not* think that the size of the social cluster is an
indication of health, there is something else that influences it... but
I don't know what it is (yet).

Overall a big '+1'

Thanks.

-- 
Stefano.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]