Re: [CODE4LIB] irc back channel logs

2011-02-12 Thread Jay Luker
Hi Eric,

The entire log for the conference can be accessed here:
http://irc.code4lib.org/c4l11/static/logs/irclog

I would have hoped that you could fetch the whole log for each day by
manipulating the url, e.g., 00:00-23:59, but for some reason that's
not working. :-(

Interested to see what you do with it.

--jay

On Fri, Feb 11, 2011 at 11:51 PM, Eric Lease Morgan emor...@nd.edu wrote:
 I have seen the cool timelines made from the IRC back channel logs [1, 2], 
 but I'm wondering how I can download the whole kit and caboodle. Such a thing 
 is ripe for text mining.  ;-)

 [1] 2009 timeline - http://irc.code4lib.org/c4l09/
 [2] sum timeline data - 
 http://irc.code4lib.org/c4l11/logslice/20110210/11:40-12:00

 --
 Eric Lease Morgan



Re: [CODE4LIB] irc back channel logs

2011-02-12 Thread Patrick Berry
On Sat, Feb 12, 2011 at 6:01 AM, Jay Luker lb...@reallywow.com wrote:

 Hi Eric,

 The entire log for the conference can be accessed here:
 http://irc.code4lib.org/c4l11/static/logs/irclog

 I would have hoped that you could fetch the whole log for each day by
 manipulating the url, e.g., 00:00-23:59, but for some reason that's
 not working. :-(

 Interested to see what you do with it.

 --jay


I blame all typos on the iPad
auto-you-didn't-really-want-to-say-that-did-you-correct.

Pat


Re: [CODE4LIB] irc back channel logs

2011-02-12 Thread Eric Lease Morgan
On Feb 12, 2011, at 9:01 AM, Jay Luker wrote:

 The entire log for the conference can be accessed here:
 http://irc.code4lib.org/c4l11/static/logs/irclog

Thank you. Downloaded, and almost done parsing...

-- 
Eric Morgan


Re: [CODE4LIB] irc back channel logs [hacks]

2011-02-12 Thread Eric Lease Morgan
I have written a few hacks allowing me to do rudimentary text mining against 
the logs. [1] From readme.txt:

  This directory contains a number of files and scripts allowing
  one to do a bit of text mining against the Code4Lib conference
  IRC log files for 2011. This is just a beginning, and the
  directory includes:
  
* irclog.txt - the raw log file downloaded from
  http://irc.code4lib.org/c4l11/static/logs/irclog
  
* log2db.pl - reads the raw log and outputs a tab-delimited
  file with three columns (date, name, text)
  
* irclog.db - the output of log2db.pl

* count.pl - outputs the number of names (n), increases (i),
  decreases (d), URLs (u), and commands (c) found in the log;
  useful for seeing what is hot and what is not.
  
* ngrams.pl - given an integer (n), outputs the most frequent
  n-length phrases; useful to see what words and phrases are 
  used most frequently

* concordance.pl - a KWIK index; the simplest of search engines

* readme.txt - this file
  
  Using these tools one can see that:
  
* Zoia had the most to say
* mbklein's karma was increased the most
* Zoia's karma was decreased the most
* the most popular URL passed around regarded social activities
* we tried to sing as many as 196 songs closely followed by anagrams
* 28 of the songs weren't found
* live streams were mentioned frequently


I have to go shovel snow now...

[1] initial hacks - http://bit.ly/gMO4op

-- 
Eric Lease Morgan