--- El jue, 11/11/10, Diederik van Liere <[email protected]> escribió:
> De: Diederik van Liere <[email protected]> > Asunto: Re: [Wiki-research-l] Editor Trends Study - Improving the tool > Para: [email protected] > Fecha: jueves, 11 de noviembre, 2010 23:44 > Dear Felipe, > > We did investigate other tools before deciding to embark on > this new > project, as you rightly point out we should minimize code > overlap. > Pywikipediabot is an editing tool as far as I know and your > tool, > WikixRay, has definitely proven itself. However, I believe > that a > no-sql solution will give better performance than sql > databases and > that has been one of the main reasons to write this tool. > > I am not sure if a separate mailing list is required, at > the moment > it's not, but thanks for the suggestion and I have added > the SVN link. > Thanks, Diederik. I'm also curious about testing the performance of MongoDB. I admit I've never tried this kind of DBs yet. Will check the SVN. Best, F. > Best, > > Diederik > > To: Research into Wikimedia content and communities > > <[email protected]> > > Message-ID: <[email protected]> > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > > > --- El mi?, 10/11/10, Diederik van Liere <[email protected]> > escribi?: > > > > De: Diederik van Liere <[email protected]> > > Asunto: [Wiki-research-l] Editor Trends Study - > Improving the tool > > Para: [email protected] > > Fecha: mi?rcoles, 10 de noviembre, 2010 00:02 > > > > Hi, Diederik, > > > > I'm also glad to see progress in this project. Some > comments inline. > > > > Dear researchers, > > > > Recently, we started the Editor Trends Study > > (http://strategy.wikimedia.org/wiki/Editor_Trends_Study). > > The goal of this study is to get a better > understanding of the community > > > > dynamics within the different Wikipedia projects. > > > > Part of this project consists of developing a tool > > (http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software) > > > > that parses a Wikipedia dump file, extracts the > required information, stores it > > in a database and exports it to a CSV file. This CSV > file can then be used in a > > statistical program such as R, Stata or SAS. > > > > Well, I would have expected that the team would have > done some previous search for open source code already > available, that implements at least some (if not exactly all > or the very same) of the planned functionalities. > > > > Some examples are my own tool, WikiXRay, and > Pywikpediabot (that, AFAIK, now it also includes a fast > parser of Wikipedia dump files). > > > > For my tool, now I use git for version control and you > can use any of the two repos available (the official at > libresoft, or the mirror at Gitorious): > > > > http://git.libresoft.es/WikixRay/ > > http://gitorious.org/wikixray/wikixray > > > > Well, they might not be the best possible software > available, but I guess they can help to solve some problems, > or at least help you to speed up the development and to > avoid starting from scratch. > > > > > > We are looking for some volunteers that would enjoy > testing the tool. You don't need to be a > > software developer (although it helps :)) to help us; > some patience, a bit of time and > > a fairly recent computer is all you need. You should > be comfortable installing programs, > > > > working with a command-line interface and have basic > Subversion experience. > > Python experience is a real bonus! > > > > The testing will focus on getting the tool to run > without any supervision. For more background information, > have a look at: > > > > http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software > > > > Perhaps you're going to provide this info later, but I > don't see the links to your SVN repo (only [] ). > > > > We are testing the tool with the largest Wikipedia > projects, so if you would like to replicate > > > > the analysis on your own favorite Wikipedia project or > help improve the quality of the tool then please contact me > off-list. > > > > I think it should be more effective to have another > public list to which people specifically interested in this > tool can suscribe (for example, like we have one for XML > dumps exclusively). > > > > This should sensibly reduce the number of duplicated > bug reports, and comments, since other people can learn > about known issues. > > > > Hope this helps. > > > > Best, > > Felipe. > > > > Best, > > > > Diederik > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > _______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
