Re: [CODE4LIB] irc back channel logs
Hi Eric, The entire log for the conference can be accessed here: http://irc.code4lib.org/c4l11/static/logs/irclog I would have hoped that you could fetch the whole log for each day by manipulating the url, e.g., 00:00-23:59, but for some reason that's not working. :-( Interested to see what you do with it. --jay On Fri, Feb 11, 2011 at 11:51 PM, Eric Lease Morgan emor...@nd.edu wrote: I have seen the cool timelines made from the IRC back channel logs [1, 2], but I'm wondering how I can download the whole kit and caboodle. Such a thing is ripe for text mining. ;-) [1] 2009 timeline - http://irc.code4lib.org/c4l09/ [2] sum timeline data - http://irc.code4lib.org/c4l11/logslice/20110210/11:40-12:00 -- Eric Lease Morgan
Re: [CODE4LIB] irc back channel logs
On Sat, Feb 12, 2011 at 6:01 AM, Jay Luker lb...@reallywow.com wrote: Hi Eric, The entire log for the conference can be accessed here: http://irc.code4lib.org/c4l11/static/logs/irclog I would have hoped that you could fetch the whole log for each day by manipulating the url, e.g., 00:00-23:59, but for some reason that's not working. :-( Interested to see what you do with it. --jay I blame all typos on the iPad auto-you-didn't-really-want-to-say-that-did-you-correct. Pat
Re: [CODE4LIB] irc back channel logs
On Feb 12, 2011, at 9:01 AM, Jay Luker wrote: The entire log for the conference can be accessed here: http://irc.code4lib.org/c4l11/static/logs/irclog Thank you. Downloaded, and almost done parsing... -- Eric Morgan
Re: [CODE4LIB] irc back channel logs [hacks]
I have written a few hacks allowing me to do rudimentary text mining against the logs. [1] From readme.txt: This directory contains a number of files and scripts allowing one to do a bit of text mining against the Code4Lib conference IRC log files for 2011. This is just a beginning, and the directory includes: * irclog.txt - the raw log file downloaded from http://irc.code4lib.org/c4l11/static/logs/irclog * log2db.pl - reads the raw log and outputs a tab-delimited file with three columns (date, name, text) * irclog.db - the output of log2db.pl * count.pl - outputs the number of names (n), increases (i), decreases (d), URLs (u), and commands (c) found in the log; useful for seeing what is hot and what is not. * ngrams.pl - given an integer (n), outputs the most frequent n-length phrases; useful to see what words and phrases are used most frequently * concordance.pl - a KWIK index; the simplest of search engines * readme.txt - this file Using these tools one can see that: * Zoia had the most to say * mbklein's karma was increased the most * Zoia's karma was decreased the most * the most popular URL passed around regarded social activities * we tried to sing as many as 196 songs closely followed by anagrams * 28 of the songs weren't found * live streams were mentioned frequently I have to go shovel snow now... [1] initial hacks - http://bit.ly/gMO4op -- Eric Lease Morgan