Re: [Sugar-devel] The quest for data

2014-01-06 Thread Walter Bender
On Mon, Jan 6, 2014 at 3:00 PM, Sameer Verma  wrote:
> On Mon, Jan 6, 2014 at 4:50 AM, Walter Bender  wrote:
>> On Mon, Jan 6, 2014 at 3:48 AM, Martin Dluhos  wrote:
>>> On 4.1.2014 10:44, Sameer Verma wrote:
>>>
 True. Activities do not report end times, or whether the frequency
 count is for the number of times a "new" activity was started, or if
 it was simply a resumption of the previous instance. Walter had
 indicated that thre is some movement in this direction to gather end
 times.
>>>
>>> This would be indeed very useful. Is anyone working on implementing these 
>>> features?
>>
>> The frequency count is a count of the number of times an instance of
>> an activity has been opened. There number of new instances can be
>> determined by the number of instance entries in the Journal.
>>
>
> Walter,
> From a conversation we had some time ago, you had pointed out that
> TuxMath does not necessarily stick to this regimen. Every time a one
> resumes an instance, it gets counted as a new instance. I haven't gone
> back to verify this, but how consistent is this behavior across
> activities? Can this behavior be standardized?

I am not sure about TuxMath (or Tuxpaint, Scratch or Etoys) none of
which are native Sugar activities. But the behavior I described is
standard across native Sugar activities.

-walter
>>>
 Yes, the methods that use the datastore as a source rely on the
 Journal, but the sugar-stats system does not. I believe it collects in
 GNOME as well.
>>>
>>> Have you done any processing, analysis, or visualization of the sugar-stats
>>> data? Is that something that you are planning to integrate into OLPC 
>>> Dashboard?
>>
>> There is an app for letting the user visualize their own stats.
>> (Journal Stats). Could use some love and attention.
>>
>
> This is an excellent example of providing meaningful feedback with
> respect to the scope. To borrow the Zoom metaphor, I see the Journal
> stats to be at the level when the scope is local to the child. The
> same scope zooms out at the level of the teacher, principal, district
> education officer, MoE, etc.
>
> cheers,
> Sameer
>
>>>
 4) The reporting can be done either via visualization, and/or by
 generating periodic reports. The reporting should be specific to the
 person(s) looking at it. No magic there.
>>>
>>> I think that many questions (some of which we already mentioned above) can 
>>> be
>>> answered with reports and visualizations, which are not deployment 
>>> specific. For
>>> example, those you are targeting with OLPC dashboard.
>>>

 How the data will be used remains to be seen. I have not seen it being
 used in any of the projects that I know of. If others have seen/done
 so, it would help to hear from them. I know that in conversations and
 presentations to decision makers, the usual sore point is "can you
 show us what you have so far?" For Jamaica, we have used a basic
 exploratory approach on the Journal data, corroborated with structured
  interviews with parents, teachers, etc. So, for instance, the data we
 have shows a relatively large frequency of use of TuxMath (even with
 different biases). However, we have qualitative evidence that supports
 both usage of TuxMath and improvement in numeracy (standardized test).
 We can support strong(er) correlation, but cannot really establish
 causality. The three data points put together make for a compelling
 case.
>>>
>>> I think this is a really important point to emphasize: None of these 
>>> approaches
>>> to evaluation provides the complete picture, but all of these used in 
>>> aggregate
>>> can provide useful insights. Here at OLE Nepal, we already use standardized
>>> testing to compare students performance before and after the program 
>>> launch. We
>>> also follow up with teachers through conversations using surveys on regular
>>> support visit. I agree with Sameer that supplementing those with statistical
>>> data can make for a much stronger case.
>>>
>>> Martin
>>>
>>> ___
>>> Devel mailing list
>>> Devel@lists.laptop.org
>>> http://lists.laptop.org/listinfo/devel
>>
>>
>>
>> --
>> Walter Bender
>> Sugar Labs
>> http://www.sugarlabs.org
>>
>>



-- 
Walter Bender
Sugar Labs
http://www.sugarlabs.org
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: [Sugar-devel] The quest for data

2014-01-06 Thread Sameer Verma
On Mon, Jan 6, 2014 at 12:04 PM, Sameer Verma  wrote:
> On Mon, Jan 6, 2014 at 12:28 AM, Martin Dluhos  wrote:
>> On 3.1.2014 04:09, Sameer Verma wrote:
>>> Happy new year! May 2014 bring good deeds and cheer :-)
>>>
>>> Here's a blog post on the different approaches (that I know of) to data
>>> gathering across different projects. Do let me know if I missed anything.
>>>
>>> cheers,
>>> Sameer
>>>
>>> http://www.olpcsf.org/node/204
>>
>> Thanks for putting together the summary, Sameer. Here is more information 
>> about
>> my xo-stats project:
>>
>> The project's objective is to determine how XOs are used in Nepalese
>> classrooms, but I am intending for the implementation to be general enough, 
>> so
>> that it can be reused by other deployments as well. Similarly to other 
>> projects
>> you've mentioned, I separated the project into four stages:
>>
>> 1) collecting data from the XO Journal backups on the schoolserver
>> 2) extracting the data from the backups and storing it in an appropriate 
>> format
>> for analysis and visualization
>> 3) statistically analyzing and visualizing the captured data
>> 4) formulating recommendations for improving the program based on the 
>> analysis.
>>
>> Stage 1 is already implemented on both the server side as well as the client
>> side, so I first focused on the next step of extracting the data. Initially, 
>> I
>> wanted to reuse an existing script, but I eventually found that none of them
>> were general enough to meet my criteria. One of my goals is to make the 
>> script
>> work on any version of Sugar.
>>
>> Thus, I have been working on process_journal_stats.py, which takes a '/users'
>> directory with XO Journal backups as input, pulls out the Journal metadata 
>> and
>> outputs them in a CSV or JSON file as output.
>>
>> Journal backups can be in a variety of formats depending on the version
>> of Sugar. The script currently supports backup format present in Sugar 
>> versions
>> 0.82 - 0.88 since the laptops distributed in Nepal are XO-1s running Sugar
>> 0.82. I am planning to add support for later versions of Sugar in the next
>> version of the script.
>>
>> The script currently supports two ways to output statistical data. To produce
>> all statistical data from the Journal, one row per Journal record:
>>
>> process_journal_stats.py all
>>
>> To extract statistical data about the use of activities on the system, use:
>>
>> process_journal_stats.py activity
>>
>> The full documentation with all the options are described in README at:
>>
>> https://github.com/martasd/xo-stats
>>
>> One challenge of the project has been determining how much data processing 
>> to do
>> in the python script and what to leave for the data analysis and 
>> visualization
>> tools later in the workflow. For now, I stopped adding features to the script
>> and I am  evaluating the most appropriate tools to use for visualizing the 
>> data.
>>
>> Here are some of the questions I am intending to answer with the 
>> visualizations
>> and analysis:
>>
>> * How many times do installed activities get used? How does the activity use
>> differ over time?
>> * Which activities are children using to create files? What kind of files are
>> being created?
>> * Which activities are being launched in share-mode and how often?
>> * Which part of the day do children play with the activities?
>> * How does the set of activities used evolve as children age?
>>
>> I am also going to be looking how answers to these questions vary from class 
>> to
>> class, school to school, and region to region.
>>
>> As Martin Abente and Sameer mentioned above, our work needs to be informed by
>> discussions with the stakeholders- children, educators, parents, school
>> administrators etc. We do have educational experts among the staff at OLE, 
>> who
>> have worked with more than 50 schools altogether, and I will be talking to 
>> them
>> as I look beyond answering the obvious questions.
>>
>
> We should start a list on the wiki to collate this information. I'll
> get someone from Jamaica to provide some feedback as well.
>
>> For visualization, I have explored using LibreOffice and SOFA, but neither of
>> those were flexible to allow for customization of the output beyond some a 
>> few
>> rudimentary options, so I started looking at various Javascript libraries, 
>> which
>> are much more powerful. Currently, I am experimenting with Google Charts, 
>> which
>> I found the easiest to get started with. If I run into limitations with 
>> Google
>> Charts in the future, others on my list are InfoVIS Toolkit
>> (http://philogb.github.io/jit) and HighCharts (http://highcharts.com). Then,
>> there is also D3.js, but that's a bigger animal.
>
> Keep in mind that if you want to visualize at the school's local
> XS[CE] you may have to rely on a local js method instead of an online
> library.
>
>>
>> Alternatively or perhaps in parallel, I am also willing to join efforts to
>> improve the OLPC Dashboard, which is t

Re: [Sugar-devel] The quest for data

2014-01-06 Thread Sameer Verma
On Mon, Jan 6, 2014 at 12:28 AM, Martin Dluhos  wrote:
> On 3.1.2014 04:09, Sameer Verma wrote:
>> Happy new year! May 2014 bring good deeds and cheer :-)
>>
>> Here's a blog post on the different approaches (that I know of) to data
>> gathering across different projects. Do let me know if I missed anything.
>>
>> cheers,
>> Sameer
>>
>> http://www.olpcsf.org/node/204
>
> Thanks for putting together the summary, Sameer. Here is more information 
> about
> my xo-stats project:
>
> The project's objective is to determine how XOs are used in Nepalese
> classrooms, but I am intending for the implementation to be general enough, so
> that it can be reused by other deployments as well. Similarly to other 
> projects
> you've mentioned, I separated the project into four stages:
>
> 1) collecting data from the XO Journal backups on the schoolserver
> 2) extracting the data from the backups and storing it in an appropriate 
> format
> for analysis and visualization
> 3) statistically analyzing and visualizing the captured data
> 4) formulating recommendations for improving the program based on the 
> analysis.
>
> Stage 1 is already implemented on both the server side as well as the client
> side, so I first focused on the next step of extracting the data. Initially, I
> wanted to reuse an existing script, but I eventually found that none of them
> were general enough to meet my criteria. One of my goals is to make the script
> work on any version of Sugar.
>
> Thus, I have been working on process_journal_stats.py, which takes a '/users'
> directory with XO Journal backups as input, pulls out the Journal metadata and
> outputs them in a CSV or JSON file as output.
>
> Journal backups can be in a variety of formats depending on the version
> of Sugar. The script currently supports backup format present in Sugar 
> versions
> 0.82 - 0.88 since the laptops distributed in Nepal are XO-1s running Sugar
> 0.82. I am planning to add support for later versions of Sugar in the next
> version of the script.
>
> The script currently supports two ways to output statistical data. To produce
> all statistical data from the Journal, one row per Journal record:
>
> process_journal_stats.py all
>
> To extract statistical data about the use of activities on the system, use:
>
> process_journal_stats.py activity
>
> The full documentation with all the options are described in README at:
>
> https://github.com/martasd/xo-stats
>
> One challenge of the project has been determining how much data processing to 
> do
> in the python script and what to leave for the data analysis and visualization
> tools later in the workflow. For now, I stopped adding features to the script
> and I am  evaluating the most appropriate tools to use for visualizing the 
> data.
>
> Here are some of the questions I am intending to answer with the 
> visualizations
> and analysis:
>
> * How many times do installed activities get used? How does the activity use
> differ over time?
> * Which activities are children using to create files? What kind of files are
> being created?
> * Which activities are being launched in share-mode and how often?
> * Which part of the day do children play with the activities?
> * How does the set of activities used evolve as children age?
>
> I am also going to be looking how answers to these questions vary from class 
> to
> class, school to school, and region to region.
>
> As Martin Abente and Sameer mentioned above, our work needs to be informed by
> discussions with the stakeholders- children, educators, parents, school
> administrators etc. We do have educational experts among the staff at OLE, who
> have worked with more than 50 schools altogether, and I will be talking to 
> them
> as I look beyond answering the obvious questions.
>

We should start a list on the wiki to collate this information. I'll
get someone from Jamaica to provide some feedback as well.

> For visualization, I have explored using LibreOffice and SOFA, but neither of
> those were flexible to allow for customization of the output beyond some a few
> rudimentary options, so I started looking at various Javascript libraries, 
> which
> are much more powerful. Currently, I am experimenting with Google Charts, 
> which
> I found the easiest to get started with. If I run into limitations with Google
> Charts in the future, others on my list are InfoVIS Toolkit
> (http://philogb.github.io/jit) and HighCharts (http://highcharts.com). Then,
> there is also D3.js, but that's a bigger animal.

Keep in mind that if you want to visualize at the school's local
XS[CE] you may have to rely on a local js method instead of an online
library.

>
> Alternatively or perhaps in parallel, I am also willing to join efforts to
> improve the OLPC Dashboard, which is trying to answer very similar questions 
> to
> mine.

I'll ping Leotis (cc'd) to push his dashboard code to github, so we
don't reinvent.

cheers,
Sameer

>
> I am looking forward to collaborating with ev

Re: [Sugar-devel] The quest for data

2014-01-06 Thread Sameer Verma
On Mon, Jan 6, 2014 at 4:50 AM, Walter Bender  wrote:
> On Mon, Jan 6, 2014 at 3:48 AM, Martin Dluhos  wrote:
>> On 4.1.2014 10:44, Sameer Verma wrote:
>>
>>> True. Activities do not report end times, or whether the frequency
>>> count is for the number of times a "new" activity was started, or if
>>> it was simply a resumption of the previous instance. Walter had
>>> indicated that thre is some movement in this direction to gather end
>>> times.
>>
>> This would be indeed very useful. Is anyone working on implementing these 
>> features?
>
> The frequency count is a count of the number of times an instance of
> an activity has been opened. There number of new instances can be
> determined by the number of instance entries in the Journal.
>

Walter,
>From a conversation we had some time ago, you had pointed out that
TuxMath does not necessarily stick to this regimen. Every time a one
resumes an instance, it gets counted as a new instance. I haven't gone
back to verify this, but how consistent is this behavior across
activities? Can this behavior be standardized?

>>
>>> Yes, the methods that use the datastore as a source rely on the
>>> Journal, but the sugar-stats system does not. I believe it collects in
>>> GNOME as well.
>>
>> Have you done any processing, analysis, or visualization of the sugar-stats
>> data? Is that something that you are planning to integrate into OLPC 
>> Dashboard?
>
> There is an app for letting the user visualize their own stats.
> (Journal Stats). Could use some love and attention.
>

This is an excellent example of providing meaningful feedback with
respect to the scope. To borrow the Zoom metaphor, I see the Journal
stats to be at the level when the scope is local to the child. The
same scope zooms out at the level of the teacher, principal, district
education officer, MoE, etc.

cheers,
Sameer

>>
>>> 4) The reporting can be done either via visualization, and/or by
>>> generating periodic reports. The reporting should be specific to the
>>> person(s) looking at it. No magic there.
>>
>> I think that many questions (some of which we already mentioned above) can be
>> answered with reports and visualizations, which are not deployment specific. 
>> For
>> example, those you are targeting with OLPC dashboard.
>>
>>>
>>> How the data will be used remains to be seen. I have not seen it being
>>> used in any of the projects that I know of. If others have seen/done
>>> so, it would help to hear from them. I know that in conversations and
>>> presentations to decision makers, the usual sore point is "can you
>>> show us what you have so far?" For Jamaica, we have used a basic
>>> exploratory approach on the Journal data, corroborated with structured
>>>  interviews with parents, teachers, etc. So, for instance, the data we
>>> have shows a relatively large frequency of use of TuxMath (even with
>>> different biases). However, we have qualitative evidence that supports
>>> both usage of TuxMath and improvement in numeracy (standardized test).
>>> We can support strong(er) correlation, but cannot really establish
>>> causality. The three data points put together make for a compelling
>>> case.
>>
>> I think this is a really important point to emphasize: None of these 
>> approaches
>> to evaluation provides the complete picture, but all of these used in 
>> aggregate
>> can provide useful insights. Here at OLE Nepal, we already use standardized
>> testing to compare students performance before and after the program launch. 
>> We
>> also follow up with teachers through conversations using surveys on regular
>> support visit. I agree with Sameer that supplementing those with statistical
>> data can make for a much stronger case.
>>
>> Martin
>>
>> ___
>> Devel mailing list
>> Devel@lists.laptop.org
>> http://lists.laptop.org/listinfo/devel
>
>
>
> --
> Walter Bender
> Sugar Labs
> http://www.sugarlabs.org
>
>
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: [Sugar-devel] The quest for data

2014-01-06 Thread Sameer Verma
On Sun, Jan 5, 2014 at 5:03 PM, Andreas Gros  wrote:
> Great utilization of CouchDB and its views feature! That's definitely
> something we can build on. But more importantly, to make this meaningful, we
> need more data.

I like this approach as well because the aggregation is offloaded to
CouchDB through views and reduce/rereduce so we can have a fairly
independent choice of Javascript-based visualization frontend, be it
Google Charts (https://developers.google.com/chart/) or D3.js
(http://d3js.org/).

> It's good to know what the activities are that are used most, so one can
> come up with a priority list for improvements, and/or focus developer
> attention.
> CouchDB allows to pull data together from different instances, which should
> make aggregation and comparisons between projects possible. And for projects
> that are not online, the data could be transferred to a USB stick quite
> easily and then uploaded to any other DB instance.
>

True. CouchDB will allow for aggregation across classes, schools,
districts, etc. Depending on the willingness of participation of
different projects, we can certainly go cross-project. Even if these
views are not made public, they will be useful. For instance, I would
love to compare my Jamaica projects with my India projects with my
Madagascar projects.

> Is there a task/todo list somewhere?
>

Not that I know of, but we can always start one on the sugarlabs wiki.
Anybody have suggestions?

Sameer

> Andi
>
>
>
>
>
>
>
>
> On Fri, Jan 3, 2014 at 11:16 AM, Sameer Verma  wrote:
>>
>> On Fri, Jan 3, 2014 at 4:15 AM, Martin Abente
>>  wrote:
>> > Hello Sameer,
>> >
>> > I totally agree we should join efforts for a visualization solution,
>> > but,
>> > personally, my main concern is still a  basic one: what are the
>> > important
>> > questions we should be asking? And how can we answer these questions
>> > reliably? Even though most of us have experience in deployments and
>> > their
>> > needs, we are engineers, not educators, nor decision makers.
>> >
>>
>> Agreed. It would be helpful to have a conversation on what the various
>> constituencies need (different from want) to see at their level. The
>> child, the parents/guardians, the teacher, the
>> principal/administrator, and educational bureaucracy. We should also
>> consider the needs of those of us who have to fundraise by showing
>> progress of ongoing effort.
>>
>> > I am sure that most of our collection approaches cover pretty much the
>> > trivial stuff like: what are they using, when are they using it, how
>> > often
>> > they use it, and all kind of things that derive directly from journal
>> > metadata. Plus the extra insight that comes when considering different
>> > demographics
>>
>> True. Basic frequency counts such as frequency of use of activities,
>> usage by time of day, day of week, scope of collaboration are a few
>> simple one. Comparison of one metric vs the other will need more
>> thinking. That's where we should talk to the constituents.
>>
>> >
>> > But, If we could also work together on that (including the trivial
>> > questions), it will be a good step forward. Once we identify these
>> > questions
>> > and figure out how to answer them, it would be a lot easier to think
>> > about
>> > visualization techniques, etc.
>>
>> If the visualization subsystem (underlying tech pieces) are common and
>> flexible, then we can start with a few basic templates, and make it
>> extensible, so we can all aggregate, collate, and correlate as needed.
>> I'll use an example that I'm familiar with. We looked at CouchDB for
>> two reasons: 1) It allows for sync over intermittent/on-off
>> connections to the Internet and 2) CouchDB has a "views" feature which
>> provides selective subsets of the data, and the "reduce" feature does
>> aggregates. The actual visual is done in Javascript. Here's the
>> example Leotis had at the OLPC SF summit
>> (http://108.171.173.65:8000/).
>> >
>> > What you guys think?
>> >
>>
>> A great start for a great year ahead!
>>
>> > Saludos,
>>
>> cheers,
>> > tch.
>> Sameer
>> ___
>> Sugar-devel mailing list
>> sugar-de...@lists.sugarlabs.org
>> http://lists.sugarlabs.org/listinfo/sugar-devel
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: [Sugar-devel] The quest for data

2014-01-06 Thread Walter Bender
On Mon, Jan 6, 2014 at 3:48 AM, Martin Dluhos  wrote:
> On 4.1.2014 10:44, Sameer Verma wrote:
>
>> True. Activities do not report end times, or whether the frequency
>> count is for the number of times a "new" activity was started, or if
>> it was simply a resumption of the previous instance. Walter had
>> indicated that thre is some movement in this direction to gather end
>> times.
>
> This would be indeed very useful. Is anyone working on implementing these 
> features?

The frequency count is a count of the number of times an instance of
an activity has been opened. There number of new instances can be
determined by the number of instance entries in the Journal.

>
>> Yes, the methods that use the datastore as a source rely on the
>> Journal, but the sugar-stats system does not. I believe it collects in
>> GNOME as well.
>
> Have you done any processing, analysis, or visualization of the sugar-stats
> data? Is that something that you are planning to integrate into OLPC 
> Dashboard?

There is an app for letting the user visualize their own stats.
(Journal Stats). Could use some love and attention.

>
>> 4) The reporting can be done either via visualization, and/or by
>> generating periodic reports. The reporting should be specific to the
>> person(s) looking at it. No magic there.
>
> I think that many questions (some of which we already mentioned above) can be
> answered with reports and visualizations, which are not deployment specific. 
> For
> example, those you are targeting with OLPC dashboard.
>
>>
>> How the data will be used remains to be seen. I have not seen it being
>> used in any of the projects that I know of. If others have seen/done
>> so, it would help to hear from them. I know that in conversations and
>> presentations to decision makers, the usual sore point is "can you
>> show us what you have so far?" For Jamaica, we have used a basic
>> exploratory approach on the Journal data, corroborated with structured
>>  interviews with parents, teachers, etc. So, for instance, the data we
>> have shows a relatively large frequency of use of TuxMath (even with
>> different biases). However, we have qualitative evidence that supports
>> both usage of TuxMath and improvement in numeracy (standardized test).
>> We can support strong(er) correlation, but cannot really establish
>> causality. The three data points put together make for a compelling
>> case.
>
> I think this is a really important point to emphasize: None of these 
> approaches
> to evaluation provides the complete picture, but all of these used in 
> aggregate
> can provide useful insights. Here at OLE Nepal, we already use standardized
> testing to compare students performance before and after the program launch. 
> We
> also follow up with teachers through conversations using surveys on regular
> support visit. I agree with Sameer that supplementing those with statistical
> data can make for a much stronger case.
>
> Martin
>
> ___
> Devel mailing list
> Devel@lists.laptop.org
> http://lists.laptop.org/listinfo/devel



-- 
Walter Bender
Sugar Labs
http://www.sugarlabs.org
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: [Sugar-devel] The quest for data

2014-01-06 Thread Martin Dluhos
On 4.1.2014 10:44, Sameer Verma wrote:

> True. Activities do not report end times, or whether the frequency
> count is for the number of times a "new" activity was started, or if
> it was simply a resumption of the previous instance. Walter had
> indicated that thre is some movement in this direction to gather end
> times. 

This would be indeed very useful. Is anyone working on implementing these 
features?

> Yes, the methods that use the datastore as a source rely on the
> Journal, but the sugar-stats system does not. I believe it collects in
> GNOME as well.

Have you done any processing, analysis, or visualization of the sugar-stats
data? Is that something that you are planning to integrate into OLPC Dashboard?

> 4) The reporting can be done either via visualization, and/or by
> generating periodic reports. The reporting should be specific to the
> person(s) looking at it. No magic there.

I think that many questions (some of which we already mentioned above) can be
answered with reports and visualizations, which are not deployment specific. For
example, those you are targeting with OLPC dashboard.

> 
> How the data will be used remains to be seen. I have not seen it being
> used in any of the projects that I know of. If others have seen/done
> so, it would help to hear from them. I know that in conversations and
> presentations to decision makers, the usual sore point is "can you
> show us what you have so far?" For Jamaica, we have used a basic
> exploratory approach on the Journal data, corroborated with structured
>  interviews with parents, teachers, etc. So, for instance, the data we
> have shows a relatively large frequency of use of TuxMath (even with
> different biases). However, we have qualitative evidence that supports
> both usage of TuxMath and improvement in numeracy (standardized test).
> We can support strong(er) correlation, but cannot really establish
> causality. The three data points put together make for a compelling
> case. 

I think this is a really important point to emphasize: None of these approaches
to evaluation provides the complete picture, but all of these used in aggregate
can provide useful insights. Here at OLE Nepal, we already use standardized
testing to compare students performance before and after the program launch. We
also follow up with teachers through conversations using surveys on regular
support visit. I agree with Sameer that supplementing those with statistical
data can make for a much stronger case.

Martin

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: [Sugar-devel] The quest for data

2014-01-06 Thread Martin Dluhos
On 3.1.2014 04:09, Sameer Verma wrote:
> Happy new year! May 2014 bring good deeds and cheer :-)
> 
> Here's a blog post on the different approaches (that I know of) to data
> gathering across different projects. Do let me know if I missed anything.
> 
> cheers,
> Sameer
> 
> http://www.olpcsf.org/node/204

Thanks for putting together the summary, Sameer. Here is more information about
my xo-stats project:

The project's objective is to determine how XOs are used in Nepalese
classrooms, but I am intending for the implementation to be general enough, so
that it can be reused by other deployments as well. Similarly to other projects
you've mentioned, I separated the project into four stages:

1) collecting data from the XO Journal backups on the schoolserver
2) extracting the data from the backups and storing it in an appropriate format
for analysis and visualization
3) statistically analyzing and visualizing the captured data
4) formulating recommendations for improving the program based on the analysis.

Stage 1 is already implemented on both the server side as well as the client
side, so I first focused on the next step of extracting the data. Initially, I
wanted to reuse an existing script, but I eventually found that none of them
were general enough to meet my criteria. One of my goals is to make the script
work on any version of Sugar.

Thus, I have been working on process_journal_stats.py, which takes a '/users'
directory with XO Journal backups as input, pulls out the Journal metadata and
outputs them in a CSV or JSON file as output.

Journal backups can be in a variety of formats depending on the version
of Sugar. The script currently supports backup format present in Sugar versions
0.82 - 0.88 since the laptops distributed in Nepal are XO-1s running Sugar
0.82. I am planning to add support for later versions of Sugar in the next
version of the script.

The script currently supports two ways to output statistical data. To produce
all statistical data from the Journal, one row per Journal record:

process_journal_stats.py all

To extract statistical data about the use of activities on the system, use:

process_journal_stats.py activity

The full documentation with all the options are described in README at:

https://github.com/martasd/xo-stats

One challenge of the project has been determining how much data processing to do
in the python script and what to leave for the data analysis and visualization
tools later in the workflow. For now, I stopped adding features to the script
and I am  evaluating the most appropriate tools to use for visualizing the data.

Here are some of the questions I am intending to answer with the visualizations
and analysis:

* How many times do installed activities get used? How does the activity use
differ over time?
* Which activities are children using to create files? What kind of files are
being created?
* Which activities are being launched in share-mode and how often?
* Which part of the day do children play with the activities?
* How does the set of activities used evolve as children age?

I am also going to be looking how answers to these questions vary from class to
class, school to school, and region to region.

As Martin Abente and Sameer mentioned above, our work needs to be informed by
discussions with the stakeholders- children, educators, parents, school
administrators etc. We do have educational experts among the staff at OLE, who
have worked with more than 50 schools altogether, and I will be talking to them
as I look beyond answering the obvious questions.

For visualization, I have explored using LibreOffice and SOFA, but neither of
those were flexible to allow for customization of the output beyond some a few
rudimentary options, so I started looking at various Javascript libraries, which
are much more powerful. Currently, I am experimenting with Google Charts, which
I found the easiest to get started with. If I run into limitations with Google
Charts in the future, others on my list are InfoVIS Toolkit
(http://philogb.github.io/jit) and HighCharts (http://highcharts.com). Then,
there is also D3.js, but that's a bigger animal.

Alternatively or perhaps in parallel, I am also willing to join efforts to
improve the OLPC Dashboard, which is trying to answer very similar questions to
mine.

I am looking forward to collaborating with everyone who is interested in
exploring ways to analyze and visualize OLPC/Sugar data in a interesting and
meaningful way.

Cheers,
Martin
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel