Sorry for being late to the party. Clearly the quest for data is a
commonly shared one, with many different approaches, questions, and
reporting/results.
One of the already mentioned solutions is the sugar-stats package,
originally developed by Aleksey, which have now been part of dextrose-sugar
builds for over a year, and the server side (xsce).
http://wiki.sugarlabs.org/go/Platform_Team/Usage_Statistics
The approach we followed was to collect as much data as possible without
interfering with sugar-apis or code. The project has made slow progress on
the visualization front, but the data collection front has already been
field tested.
I for one think there are a few technical trade-offs, which lead to larger
strategy decisions:
* Context v/s Universality ... Ideally we'd like to collect (activity)
context specific data, but that requires tinkering with the sugar api
itself and each activity. The other side is we might be ignoring the other
types of data a server might be collecting ... internet usage and the
various other logfiles in /var/log
* Static v/s Dynamic ... Analyzing journal backups is great, but they are
ultimately limited in time resolution due to the datastore's design itself.
So the key question being what's valuable? ... a) Frequency counts of
activities? b) Data such as upto the minute resolution of what activities
are running, which activity is active (visible when), collaborators over
time ... etc ...
In my humble opinion, the next steps could be:
1 Get better on the visualization front.
2 Search for more context. Maybe arm the sugar-datastore to collect higher
resolution data.
On Tue, Jan 7, 2014 at 12:24 PM, Christophe Guéret
christophe.gue...@dans.knaw.nl wrote:
Dear Sameer, all,
That's a very interesting blog post and discussion. I agree that
collecting data is important but knowing that are the questions aimed to be
answered with that data is even more so. If you need help with that last
bit, I could propose to use the journal data as a use-case for the project
KnowEscape ( http://knowescape.org/ ). This project is about getting
insights out of large knowledge spaces via visualisation. There is wide
(European) community of experts behind it coming from different research
fields (humanities, physic, computer science, ...). Something useful could
maybe come out...
I would also like to refer you to the project ERS we have now almost
finished. This project is an extension of the ideas behind SemanticXO some
of you may remember. We developed a decentralised entity registry system
with the XO as a primary platform for coding and testing. There is a
description of the implementation and links to code on
http://ers-devs.github.io/ers/ . We also had a poster at OLPC SF (thanks
for that !).
In a nutshell, ERS creates global and shared knowledge spaces through
series of statements. For instance, Amsterdam is in the Netherlands is a
statement made about the entity Amsterdam relating it to the entity the
Netherlands. Every user of ERS may want to either de-reference an entity
(*e.g.*, asking for all pieces of information about Amsterdam) or
contribute to the content of the shared space by adding new statements.
This is made possible via Contributors nodes, one of the three types of
node defined in our system. Contributors can interact freely with the
knowledge base. They themselves take care of publishing their own
statements but cannot edit third-party statements. Every set of statements
about a given entity contributed by one single author is wrapped into a
document in couchDB to avoid conflicts and enable provenance tracking.
Every single XO is a Contributor. Two Contributors in a closed P2P network
can freely create and share Linked Open Data. In order for them to share
data with another closed group of Contributors, we haves Bridges. A
Bridge is a relay between two closed networks using the internet or any
other form of direct connection to share data. Two closed communities, for
example two schools, willing to share data can each setup one Bridge and
connect these two nodes to each other. The Bridges will then collect and
exchange data coming from the Contributors. These bridges are not
Contributors themselves, they are just used to ship data (named graphs)
around and can be shut-down or replaced without any data-loss. Lastly, the
third component we define in our architecture is the Aggregator. This is
a special node every Bridge may push content to and get updated content
from. As its name suggests, an Aggregator is used to aggregate entity
descriptions that are otherwise scattered among all the Contributors. When
deployed, an aggregator can be used to access and expose the global content
of the knowledge space or a subset thereof.
One could use ERS to store (part of) the content of the Journal on an XO
(Contributor), cluster information as the school level (Bridge put on the
XS) and provide higher level analysis