About time. The more recent Monthly Report Cards (http://reportcard.wmflabs.org/) which organizes stats by region are extremely useful and I often use your animated history of wikipedia to start presentations because it's so beautiful.
Bravo! Bishakha On Sat, Jan 4, 2014 at 7:01 AM, Sue Gardner <sgard...@wikimedia.org> wrote: > Just wanted to share this article, because it makes me so happy! > Erik's one of our earliest contributors and *we've* all depended on > his work for years, but it's mostly invisible to the world beyond > Wikimedia. It makes me really happy to see him get some external > recognition :-) > > Thanks, > Sue > > ---------- Forwarded message ---------- > From: "Jay Walsh" <jwa...@wikimedia.org> > Date: 27 Dec 2013 12:20 > Subject: [Wmfcc-l] [press] Erik Z in Wired > To: "Communications Committee" <wmfc...@lists.wikimedia.org> > Cc: > > http://www.wired.com/wiredenterprise/2013/12/erik-zachte-wikistats/ > > Meet the Stats Master Making Sense of Wikipedia’s Massive Data Trove > > BY ASHIK SIDDIQUE > 12.27.13 > 9:30 AM > > Erik Zachte. Photo: Lane Hartwell/Wikimedia Foundation > > There are websites, and then there’s Wikipedia. The internet behemoth > boasts 30 million articles written in more than 285 languages, tweaked > by 70,000 active editors and viewed by 530 million visitors worldwide > each month. As mountains of information go, it’s Everest. Teasing out > trends from the open source encyclopedia’s archives is a task few > would even attempt. Yet Erik Zachte did just that. > > Zachte used his statistical intuition to create “Wikistats,” an online > statistics package that’s more than a trove of charts and graphs for > data geeks. It’s the most direct measure yet of Wikipedia’s success in > achieving its central objective: making the sum of all human knowledge > available to everyone everywhere. > > “When I discovered Wikipedia I felt thrilled from the outset,” says > Zachte, who was working as an IT guy at KLM Airlines in the early days > of the Wiki revolution. Not content simply to edit articles, he joined > the mailing lists in which a fervid network of volunteers debated how > to increase the site’s functionality. As Wikipedia exploded in > popularity, power users complained there was no consistent way to > measure its growth in article count from the beginning. > > “In 2003 there was already an online page counter if I remember > correctly, but not much else,” says Zachte. He realized it was > possible to extract far more descriptive data from historical metadata > in Wikipedia’s massive database dumps, copies of all raw content that > available to anyone in XML format. > > He started crunching numbers and quickly became famous among fellow > Wikiholics for developing Wikistats. The site’s monthly reports filled > a valuable niche for descriptive metrics in the Wiki community, with > measures like article count, number of editors, and edits per article > that serve as proxy indicators of Wiki quality. Impressed by Zachte’s > stat-fu, the nonprofit Wikimedia Foundation that supports the > Wikipedia infrastructure made him its data analyst in 2008. > > Since then, Zachte’s figures – all of which are open source and in the > public domain – have revealed ongoing challenges to the organization’s > growth, as well as noteworthy trends. > > Wikistats data made it clear that a core of Wikipedians does an > outsize portion of the editing. As of October, 4.7 million people have > contributed to the English language Wikipedia, but just over 26,000 > people have made more than 1,000 edits. In fact, that relatively small > group of people has made 73 percent of all edits. While a small core > of very active editors has remained stable, a larger pool of active > editors (those making at least five edits monthly) in all Wikipedia > language editions peaked at 90,000 in 2007 and has dropped since. As > of October, the count stands at 70,000. > > That has some worried that a shrinking community indicates declining > quality and concerted efforts within the Wikimedia Foundation to boost > editor engagement, which the organization considers one of the > foremost indicators of Wikipedia’s success. In 2009, the organization > launched an ambitious five-year strategic plan to drastically increase > language and content diversity by encouraging internet users in the > “Global South” – particularly the developing regions of Africa, Asia, > the Middle East, and Latin America – to contribute. Wikistats metrics > gauge its progress each month. > > “Many projects exist within WMF to influence editor influx and > retention,” says Zachte, “but in the end Wikistats gives the final > count: Are we on the right track?” > > The numbers show reason for measured optimism. While the largest and > most densely populated language editions like English, German, French, > and Japanese, have seen the number of active editors level off or even > decline since about 2007, newer editor networks in highly populous > languages like Chinese, Arabic, and Persian continue to grow. In > addition, the global share of page edits is slowly shifting to > populous countries in the southern hemisphere, some of which, like > India and the Philippines, use and edit Wikipedia overwhelmingly in > English. > > Zachte’s reports also reveal idiosyncratic patterns of activity in > different languages. > > For example, some volunteer coders program bots to create article > stubs in massive bursts, hoping other users will expand the articles > over time. While bots can supplement the work of active editor > networks, Wikistats summaries show that some language editions are > populated almost entirely by bot-created stubs – like the Cebuano and > Waray-Waray Wikipedias, which rocketed to almost one million articles > this year despite tiny editor networks that are unlikely to fill in > those blanks anytime soon. > > Zachte’s animation of growth for all Wikipedia language sites, which > measures four aspects of each site: bubbles representing each language > slide across an x-axis indicating their age and up a y-axis measuring > their article count, expanding as their editor networks grow and > changing color as average article size grows. Image: Erik Zachte > > The data also provide raw material for striking visualizations, which > Zachte sometimes creates and posts on his blog, Infodisiac and > compiles from other authors on Wikistats. > > For years, Zachte was the only staffer working on general metrics > about Wikipedia, but today the Wikimedia Foundation now has many > analysts and engineers crunching data. The organization is preparing > to absorb Zachte’s work into a much more powerful data infrastructure. > > “The plan is to take the existing functionality of Wikistats and > modernize it across the board,” says Toby Negrin, Wikimedia’s director > of analytics. “Erik’s work is amazing, but we need to make the data > more accessible and update it faster.” > > One recent update is a streamlined Monthly Report Card that tracks > user engagement by language and geographical region, with customizable > graphs measuring factors like unique visitors, page views, and editing > activity over time. Other extensions will capture and analyze all > Wikimedia traffic, and provide metrics for editor engagement projects > like Wikipedia Zero, which gives users in developing countries free > Wikipedia access on their mobile devices. > > Zachte embraces the changes. “Most of what I built will be phased out > over the coming years,” he says. “I’m fine with that. All software has > a limited lifespan.” > > Until the new infrastructure can take over, Zachte maintains the > scripts that populate Wikistats reports while working from home in > Leiden, the Netherlands. Occasionally, he works on analytic pet > projects. His next idea focuses on measuring content diversity across > different Wikipedia language editions. > > “In early years Wikipedia was often characterized as mostly geek > content: physics and sci-fi,” he says. “People don’t do that anymore, > but is our content really balanced now? Do we have similar depth of > content for ballet or folk culture or fashion?” > > Most articles in larger Wikipedias are assigned multiple categories – > for example, the English-language entry for Barack Obama lists 45. But > users can assign a single article many different categories, and each > category can have an unlimited number of parent categories. That makes > it difficult to easily compare the number of articles in each category > as an indicator of content diversity. > > Zachte’s idea is that comparing word frequencies within articles to > word frequencies for all named categories in a language (the English > Wikipedia has over 1 million, according to a 2012 estimate) can more > effectively categorize articles, and create profiles of which topics > receive more heavy coverage. He has written a proposal, but it’s still > unclear how it fits into Wikimedia’s current budget. It might just be > a hobby project – or, open source to the end, he concedes that someone > else might as well scoop him. > > “Now I have given away the basic concept,” he says. “Someone can base > her thesis on this, and beat me to it, which is fine. Science would > progress faster if it did not thrive on secrecy.” > > Another Zachte animation visualizes all Wikipedia edits on a specific > day in July 2011, on a world map in which 369,483 edits in multiple > languages appear as geographically distributed bursts of color in a > sped-up version of real time. Image: Erik Zachte > > Tags: Erik Zachte, Wikimedia Foundation, Wikipedia > > Post Comment | > > Comments | > > Permalink > > > -- > Jay Walsh > WikimediaFoundation.org > blog.wikimedia.org > @jansonw > > _______________________________________________ > Wmfcc-l mailing list > wmfc...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wmfcc-l > > _______________________________________________ > Wikimedia-l mailing list > Wikimedia-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>