Re: [Foundation-l] [Wiki-research-l] How to improve quality of Wikipedia?
--- El dom, 10/10/10, Federico Leva (Nemo) nemow...@gmail.com escribió: De: Federico Leva (Nemo) nemow...@gmail.com Asunto: Re: [Wiki-research-l] [Foundation-l] How to improve quality of Wikipedia? Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org CC: Research into Wikimedia content and communities wiki-researc...@lists.wikimedia.org Fecha: domingo, 10 de octubre, 2010 19:45 Przykuta, 10/10/2010 19:17: Old talk pages with solved problems are deleted. This is extremely strange. Talk pages are part of the article history. Ortega's thesis should be updated, perhaps. «The combination of a very active cohort of bots, together with the very low ratio of talk pages, indicates that the Polish language version is not following the same organizational pattern found in other language editions. Such a low ratio of talk pages points out the little effort undertaken on coordination actions and discussion about article contents in the Polish version.» (http://libresoft.es/Members/jfelipe/phd-thesis , p. 91) Thanks for pointing out this, Nemo. I might have missed the thread in Foundation-l otherwise :). Well, at least this gives a partial explanation for the very low ratio of available talk pages, though I personally think it is not enough to explain such a really really low figure. In fact, I concur that this is very strange. As far as I have understood up to now, talk pages also serve as a backup log of past discussions for new users approaching an article for the first time. If this is true, then in PL some new editor of an article might run the risk of raising again a issue or a contribution that was already discussed a year ago by editors working on that article. Best, Felipe. Talk pages of dynamic IP are deleted too (we wait ~6 months and delete them by bot). I don't know - is it standard behavior in other Wikiepdias or specific for pl. This isn't very relevant. On it.wiki they used to be deleted by (unapproved) bots (run under sysop accounts); since some years they're just replaced with a welcome IP template every month if they're more than a month old. * Finally, Polish Wikipedia has fewer active users than any of the next three smaller Wikipedias - Italian, Japanese and Spanish - which might be significant here. Fewer users talk less, so there's fewer natural discussion pages. True - we have only ~300 very active users. We rather use main. One of the most often used slogan is we work here, not talk. Many times we spend in flagged revisions - so, we are sure, that 90% are free of vandalism. This is very important. The real question is: how can pl.wiki be so big (and useful, looking at pageviews) with such a little editor base? Seems a good result. Nemo ___ Wiki-research-l mailing list wiki-researc...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Wikipedia is not bureaucracy, said bureaucrat and deleted article
--- El jue, 26/11/09, Milos Rancic mill...@gmail.com escribió: De: Milos Rancic mill...@gmail.com Asunto: [Foundation-l] Wikipedia is not bureaucracy, said bureaucrat and deleted article Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Fecha: jueves, 26 de noviembre, 2009 11:36 Read http://news.slashdot.org/story/09/11/25/160236/Contributors-Leaving-Wikipedia-In-Record-Numbers Article is based on Felipe Ortega's research. There are two claims from this article: Hello, Milos, all. 1. English-language version of Wikipedia suffered a net loss of 49,000 contributors, compared with a loss of about 4,900 during the same period in 2008 Please, read the following blog post, which I already supervised in consensus with Erik Moller, explaining the difference between retaining editors (the numbers displayed in WSJ original article) and monthly number of active editors http://blog.wikimedia.org/2009/11/26/wikipedias-volunteer-story/ 2. There is an increase of bureaucracy and rules. which is becoming increasingly difficult says Andrew Dalby, author of The World and Wikipedia: How We are Editing Reality and a regular editor of the site. 'There is an increase of bureaucracy and rules. Wikipedia grew because of the lack of rules. That has been forgotten. The rules are regarded as irritating and useless by many contributors.' This is Andrew Dalby's quote, not mine. I would like to hear from Felipe clarification of the claim that 49,000 contributors left Wikipedia. If it is so, then en.wp has around ten times more fluctuation of contributors. (According to statistics [1], there are no significant changes between the first months of 2008 and 2009.) If it is so, we should try to understand why is it so. The second claim produced a lot of *relevant* testimonies from Wikipedian work. Please, read them. For the first time I see highly relevant discussion on Slashdot about Wikipedia structure. All of them are talking about current problems of Wikipedia. Problems are now visible at such level, that main stream media are talking about them [2]. I would say that we need some radical moves to stop current negative trends inside of the projects. Which? I don't know. We should think about them. (Actually, I have a couple of possible changes in my mind, which are not radical. However, their implementation would need radical changes. Because of bureaucracy.) [1] - http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm [2] - http://technology.timesonline.co.uk/tol/news/tech_and_web/the_web/article6930546.ece ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] WSJ on Wikipedia
--- El mar, 24/11/09, Nikola Smolenski smole...@eunet.rs escribió: De: Nikola Smolenski smole...@eunet.rs Asunto: Re: [Foundation-l] WSJ on Wikipedia Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Fecha: martes, 24 de noviembre, 2009 08:50 Felipe Ortega wrote: Wikipedia just entered a new phase. Our responsibility (as long-time Wikipedia researchers) is to find out the causes (not necessarily negative, please read a PDF summarizing a recent electronic interview for the Strategy plan, at http://strategy.wikimedia.org/wiki/Interviews) and prevent any possible problems as much in advance as possible. Why not also conduct interviews with Wikipedia editors, either a random sample or targeted people (for example, people who had significant contribution and then stopped). Yeah, this is another interesting approach. The problem with it is that it's difficult to contact former editors/admins, once they abandon the project definitely (in my experience). Other strategies are too aggressive (like spamming talk pages) etc. and they should always be avoided. We had an interesting discussion about this issue in an Open Space session at WikiSym 2009. It has resulted in a new project to try and improve these communication mechanisms: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research Regards, F. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] WSJ on Wikipedia
--- El lun, 23/11/09, Steven Walling steven.wall...@gmail.com escribió: De: Steven Walling steven.wall...@gmail.com I didn't see any of the graphs from the piece or any conclusions in the thesis which are equivalent to the statements made in the Journal, so this must be new research. Hi, Steven. I'm Felipe Ortega the author of the numbers and graphs you're mentioning. Yes, these are recent updated results of our long-time research line about the Wikipedia community. They were firstly presented at WikiSym 2009, and before that on a coference in the Web Science Lecture Series, at Georgia Tech (both on last October). As always, I just want to state that, even though the numbers doesn't seem really good for the sustainability of the project in the long term, I struggle daily to fight against fatalist claims or headlines speculating about the end of the project. Wikipedia just entered a new phase. Our responsibility (as long-time Wikipedia researchers) is to find out the causes (not necessarily negative, please read a PDF summarizing a recent electronic interview for the Strategy plan, at http://strategy.wikimedia.org/wiki/Interviews) and prevent any possible problems as much in advance as possible. As usual, I'm at your disposal for any comments/clarifications. Best, Felipe. Steven ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] WSJ on Wikipedia
--- El lun, 23/11/09, Andrew Lih andrew@gmail.com escribió: De: Andrew Lih andrew@gmail.com Asunto: Re: [Foundation-l] WSJ on Wikipedia Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Fecha: lunes, 23 de noviembre, 2009 22:49 On Mon, Nov 23, 2009 at 12:28 PM, David Moran fordmadoxfr...@gmail.com wrote: I think a lot of attention is paid to the way the technical interface is hostile to newbies, and making that more user-friendly and democratic is certainly a concern that needs to be addressed. But I think the tendency of older users, or certain editorially minded users, to squat on the project and bludgeon newer users with policy pages rolled up into sticks is just as much if not more responsible for driving away the new users we need to replenish our ranks. At Wikimania 2009 it was noted there were declines across different language editions, which started happening at the same time. This suggest that it's not simply the completeness of a particular edition at play here, as the development cycle of each different language edition should be fairly distinct. Rather, the sharp declines across languages indicates it could be a platform feature (ie. software, policy, et al) or that there is an interdependency across the language groups or some other outlying variable. The session at Wikimania about this: http://wikimania2009.wikimedia.org/wiki/Proceedings:221 I'll be doing a talk at SXSW 2010 about this next year, and I welcome any/all theories and what areas of research to pursue. http://bit.ly/8Hh52 Thank you very much, Andrew for your comments. I'm really afraid I won't be able to attend to SXSW 2010. But, I'll attend for sure Wikimania 2010 next year, and I hope we'll have some time to reflect on these issues. Best, Felipe. -Andrew Lih ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Analysis of statistics
--- El sáb, 25/7/09, John at Darkstar vac...@jeb.no escribió: De: John at Darkstar vac...@jeb.no Asunto: Re: [Foundation-l] Analysis of statistics Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Fecha: sábado, 25 julio, 2009 3:47 I asked a source if they may grant us access to some statistics on users behaviour within social media. The time series starts well before Nupedia. That would be great, John. Though Wikipedia peculiarities should be taken into account, long time series would allow interesting comparisons. In particular, about the future trends that we may expect to find in the future, from patterns already observed in other scenarios with a wider timespan. Best, Felipe. John Felipe Ortega wrote: --- El vie, 24/7/09, Milos Rancic mill...@gmail.com escribió: De: Milos Rancic mill...@gmail.com Asunto: Re: [Foundation-l] Analysis of statistics Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Fecha: viernes, 24 julio, 2009 5:25 Whatever means in the official statistics. It would be good to have numbers about newcomers and those who made 10 or 100 edits, so we may compare how do we attract attention through the time. However, I think that those numbers are relatively stable in the past couple of years (let's say, from 2005 or so). You can check more precise figures and graphs in my thesis about general statistics for survivability for all logged editors and core editors (the top 10% most active editors in each month), from the beginning until Dec. 2007, in the top-ten language versions (at that time). http://libresoft.es/Members/jfelipe/phd-thesis (page) http://libresoft.es/Members/jfelipe/thesis-wkp-quantanalysis (doc) As for the percentages of users by age, education level, etc. my impression is that opinions from experienced community members are often well oriented. But they're only opinions. Until we get the results of the general survey, we won't have a clear picture of the current recruitment targets for all versions. Nevertheless, according to our updates, it seems that the situation is not getting better from Jan 2008 onwards. Best, Felipe. ___ foundation-l mailing list foundatio...@lists..wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Analysis of statistics
This is a good point, Milos. Quantity and quality are more related to each other than we may thought initially. For instance: * The main proportion of Featured Articles in all top-ten language versions needed, at least, more than 1,000 days (3 years) to reach that level. * Most of editors contributing to FAs were high experienced editors, meaning more than 2.5 or 3 years participating in Wikipedia. And these editors tend to be very active ones (though they not necessarily get 'sysop' or other special privileges). I recall you that more than 50% of editors abandonned after aprox.. half a year, in all versions we studied. Therefore, the high experienced editors are taking care of top-quality content. Probably because they know, better than many other editors, the guidelines, procedures and daily workflows in the community. Of course, their knowledge (about the topics they contribute to) also matters. But I believe that the first condition is also critical. And you can get to that point with time, interacting with Wikipedia and the community. As a result, any attempt to improve the feeling of newcomers as they start to contribute is invaluable. I've read your comments about chats with sysops or article's main editors. I've also read about training environments (customized sandboxes, more friendly, etc.). So, all this makes *a lot of sense* in the current situation. Not because of quantity, but to improve *quality*. Best, Felipe. --- El sáb, 25/7/09, Milos Rancic mill...@gmail.com escribió: De: Milos Rancic mill...@gmail.com Asunto: Re: [Foundation-l] Analysis of statistics So, to give the answer about quantity vs. quality: We need quantity to have sustainable community development or even just a sustainable stagnation. We shouldn't be shy of saying that quantity is very important to us because we are able to build quality. And, yes, it is possible that quality brings quantity. This thread is about that: we have to think how to do that. If we don't think (thinking=quality) how to bring quantity and our quantity is lowering: we are at the dead end. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Analysis of statistics
--- El vie, 24/7/09, Milos Rancic mill...@gmail.com escribió: De: Milos Rancic mill...@gmail.com Asunto: Re: [Foundation-l] Analysis of statistics Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Fecha: viernes, 24 julio, 2009 5:25 Whatever means in the official statistics. It would be good to have numbers about newcomers and those who made 10 or 100 edits, so we may compare how do we attract attention through the time. However, I think that those numbers are relatively stable in the past couple of years (let's say, from 2005 or so). You can check more precise figures and graphs in my thesis about general statistics for survivability for all logged editors and core editors (the top 10% most active editors in each month), from the beginning until Dec. 2007, in the top-ten language versions (at that time). http://libresoft.es/Members/jfelipe/phd-thesis (page) http://libresoft.es/Members/jfelipe/thesis-wkp-quantanalysis (doc) As for the percentages of users by age, education level, etc. my impression is that opinions from experienced community members are often well oriented. But they're only opinions. Until we get the results of the general survey, we won't have a clear picture of the current recruitment targets for all versions. Nevertheless, according to our updates, it seems that the situation is not getting better from Jan 2008 onwards. Best, Felipe. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Wikipedia: A quantity analysis
Hello. I'm Felipe Ortega, the URJC researcher author of this thesis. Thank you for pointing this out, though I believe that I already CC the Foundation mailing list some months ago, immediately after the manuscript was published in our website. I would like to take this opportunity to comment the very important coverage that the results of the thesis has got from Spanish national mass media this week. Last Wed., URJC published an official press release about it: http://www.urjc.es/z_files/ai_noti/ai05/noticia_completa.php?ID=1034 Just a few hours later, the main Spanish national newspapers followed on the news: ABC http://www.abc.es/20090709/medios-redes-web/wikipedia-estanca-caida-numero-200907091136.html El Mundo (featured in main page http://elmundo.es, as of Thu. July 9) http://www.elmundo.es/elmundo/2009/07/09/navegante/1247132861.html El País (also in main page, technology section, as of Thu. July 9) http://www.elpais.com/articulo/tecnologia/Wikipedia/pierde/editores/elpeputec/20090709elpeputec_1/Tes As well, it has been published in Europa Press, EFE news, and other major national news agencies. According to Google, right now we're being reported on more than 30 different news sites :). Yesterday, Fri. 10, it also received coverage by some important radio stations (Onda Madrid, Punto Radio) in Madrid region, as well as by national radio broadcasting consortium Cadena Ser, at national level, on the night of Thu. 9 (Hora 25, prime time news program). Some of these included live interviews to stand out the main results. Finally, I would really like to remark that, despite the obvious trend of some journalist to seek for a sensationalist headline like could it be the end of Wikipedia?, I have tried as much as I could to explicitly avoid this, and just told about Wikipedia reaching a new stabilized stage in both the number of active editors and number of revisions per month [...] the main cause could be that the project is losing, each month, more authors than the number of new contributors that arrive to help for the first time. In fact, we would like to keep on working, with the help of other academic institutions, researchers and support from the community, to better understand the content creation process in Wikipedia, specially as for the improvement of quality, and to find better strategies to retain editors for longer periods of time. All the best, Felipe. --- El jue, 9/7/09, Crazy Lover always_yours.fore...@yahoo.com escribió: De: Crazy Lover always_yours.fore...@yahoo.com Asunto: [Foundation-l] Wikipedia: A quantity analysis Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Fecha: jueves, 9 julio, 2009 8:39 Below it is a Doctoral Thesis that analize quantitatively the evolution of top ten Wikipedias, an their problems (in English): http://libresoft.es/Members/jfelipe/thesis-wkp-quantanalysis C.m.l. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
[Foundation-l] Public repositories for research dumps
Hello. Since just a few hours ago, a new public repository has been created to host WikiXRay database dumps, containing info extracted from public Wikipedia dbdumps. The image is hosted by RedIRIS (in short, the Spanish equivalent of Kennisnet in Netherlands). http://sunsite.rediris.es/mirror/WKP_research ftp://ftp.rediris.es/mirror/WKP_research These new dumps are aimed to save time and effort to other researchers, since they won't need to parse the complete XML dumps to extract all relevant activity metadata. We used mysqldump to create the dumps from our databases.. As of today, only some of the biggest Wikipedias are available. However, in the following days the full set of available languages will be ready for downloading. The files will be updated regularly. The procedure is as follows: 1. Find the research dump of your interest. Download and decompress it in your local system. 2. Create a local DB to import the information. 3. Load the dump file, using a MySQL user with insert privileges: $ mysql -u user -p passw myDB dumpfile.sql And you're done. Final warning. 3 fields in the revision table are not reliable yet: rev_num_inlinks rev_num_outlinks rev_num_trans All remaining fields/values are trustable (in particular rev_len, rev_num_words, and so forth). Regards, Felipe. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] dumps
--- El mié, 25/2/09, Anthony wikim...@inbox.org escribió: De: Anthony wikim...@inbox.org Asunto: Re: [Foundation-l] dumps Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Fecha: miércoles, 25 febrero, 2009 5:26 On Tue, Feb 24, 2009 at 11:26 PM, Brian brian.min...@colorado.edu wrote: Which uncompressed dump? The full history English Wikipedia dump doesn't exist, and there doesn't seem to be any demand for this anyway. Mmmm, sorry but then, I'm afraid that you missed some messages over the past year and a half on Wikitech-l, eagerly asking for the whole version of the English dump. Just to give a straightforward application: people analyzing Wikipedia from a quantitative point of view need the whole dump file, no matter what do you want to examine. And believe it or not, the number of scholars (in different disciplines) focusing on this topic is growing steadily (actually, we could be many more if we could have a stable process, updated with reasonable frequency ;) ). It's also really difficult for people like me to advocate in favor of this line of research when we have such problems, though we found the way to accept these limitations so far (better something than nothing). Best, F. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] dumps
--- El jue, 26/2/09, Brian brian.min...@colorado.edu escribió: De: Brian brian.min...@colorado.edu Asunto: Re: [Foundation-l] dumps Para: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Fecha: jueves, 26 febrero, 2009 12:33 Ahh ok. Anyone who wants to do processing on the full history (and there are a lot of these people who exist!) by definition *has* to be willing to throw some money at it. It simply doesn't fit on commercial drives. Not necessarily. For instance, WikiXRay is capable of parsing the dump file on the fly, so you don't need to uncompress the whole file if you don't want to, and the result tipically fits in a 6-8 GB DB (depending on the amount of data your recover), which fits perfectly in commodity hw. On the other hand, I completely agree with you in that working with the huge XML file requires specific hw (we bought a couple of servers for that). People *just want the data*. Many people would be willing to pay a fee. Probably, but anyway, I would like to avoid paying a fee to access what should be publicly available (at least, until the dump process broke, it was). Some universities (including ourselves) has offered storage capacity and some bandwith to distribute mirrors and improve the dump availability, at no cost at all :). I have a rare copy of the last available full text dump. Perhaps I should initiate the process myself. Nothing prevents you to do that (I think) and it could be a stimulus for thinking on subsequent solutions. Best, F. On Wed, Feb 25, 2009 at 2:20 PM, Thomas Dalton thomas.dal...@gmail.comwrote: 2009/2/25 Brian brian.min...@colorado.edu: What has led you to believe there is no demand for a full dump of the english wikipedia? He didn't say there was no demand, he said there was no demand for having it on Amazon. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l