Re: [Wiki-research-l] Features that correlate with quality (Was: Quality and pageviews)

2010-06-05 Thread Federico Leva (Nemo)
Brian J Mingus, 04/06/2010 16:17: o Note that G articles are extremely hard to predict and should be merged with another quality class. Or viceversa this is a useful class because of that, i.e. because gives infos that an automated algorithm can't give? Nemo

Re: [Wiki-research-l] Fwd: modern foundations of scientific consensus

2010-06-21 Thread Federico Leva (Nemo)
Samuel Klein, 22/06/2010 02:11: The Foundation is now in a position to help support this sort of work with better contacts and brainstorming (than it was 5 years ago when these ideas were first developed), but someone still needs to design and run these projects... I don't think anyone is

Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a universal citation index

2010-07-19 Thread Federico Leva (Nemo)
Brian J Mingus, 19/07/2010 22:20: The basic idea is a centralized wiki that contains citation information that other MediaWikis and WMF projects can then reference using something like a {{cite}} template or a simple link. The community can document the citation, the author, the book etc..

Re: [Wiki-research-l] wikistream: displays wikipedia updates in realtime

2011-08-17 Thread Federico Leva (Nemo)
Ed Summers, 22/06/2011 12:14: On Wed, Jun 22, 2011 at 2:25 AM, Dmitry Chichkov wrote: You may want to take a look at the wpcvn.com - it also displays realtime stream (filtered)... Oh wow, maybe I can shut mine off now :-) Looks like the opposite happened. Nemo

Re: [Wiki-research-l] wikitweets: view tweets that reference wikipedia in realtime

2012-04-28 Thread Federico Leva (Nemo)
Taha Yasseri, 27/04/2012 04:45: I would be very much interested, since such data shows us when and where people refer to WP in an overall image. Archiving tweets is surely useful, cf. http://archive.org/details/archiveteam-json-twitterstream-2012. An archive.org item with wikitweets in

Re: [Wiki-research-l] More accurate revert detection in Wikipedia, alternative to MD5 identical revision method

2012-06-27 Thread Federico Leva (Nemo)
I don't understand: if 35 % of the sample reverts identified by the hash method are not considered such by human check and the new system has a 70 % accuracy, the difference in false positives is 5 %? I don't understand from the paper either. The main point seems to be about the more reverts

Re: [Wiki-research-l] request for Git statistics (or, don't stand back, I don't know regular expressions) (Wiki-research-l Digest, Vol 83, Issue 13)

2012-07-24 Thread Federico Leva (Nemo)
Sumana Harihareswara, 20/07/2012 23:38: On 07/19/2012 04:19 PM, Federico Leva (Nemo) wrote: Sumana Harihareswara, 19/07/2012 22:08: I noticed the jump in the June engineering report. Where does the big difference compared to previous month's number come from? Nemo Now that we've re

Re: [Wiki-research-l] Data for Portuguese Wikipedia

2012-08-11 Thread Federico Leva (Nemo)
emijrp, 08/11/2012 10:25 AM: You can analyse the dump http://dumps.wikimedia.org/ptwiki/20120806/ or ask for an account in Toolserver http://toolserver.org/ or both. ...but you first have to define what a revert is for you, as this is not obvius at all (see also lengthy discussions on this

Re: [Wiki-research-l] Open-Access journals for papers about wikis

2012-09-15 Thread Federico Leva (Nemo)
emijrp, 15/09/2012 11:12: The idea of creating a journal just for wikis is highly seductive for me.The pillars might be: * peer-reviewed, but publish a list of rejected papers and the reviewers comments * open-access (CC-BY-SA) * ask always for the datasets and offer them to download, the same

Re: [Wiki-research-l] [Wikimedia-l] Copy and paste

2012-10-18 Thread Federico Leva (Nemo)
How hard would it be to set up a tool like the software that as far as I know the MIT uses to automatically check plagiarism among thesis etc. submitted to their digital library, checking the text of all Wikimedia projects against e.g. newspaper websites and Google Books, and then publishing

Re: [Wiki-research-l] Minor stats on Wikipedia

2012-10-31 Thread Federico Leva (Nemo)
Piotr Konieczny, 31/10/2012 23:08: Would anyone have/know where to find any of the following estimates for English Wikipedia, either as a number or as % of the total population of editors (which is known): Not exactly, but: * of people who edited Wikipedia anonymously

Re: [Wiki-research-l] 2012 top pageview list

2013-01-03 Thread Federico Leva (Nemo)
Kerry Raymond, 02/01/2013 22:46: The problem (as always) is that there is a difference between pages served (by the web server) and pages actually wanted and read by the user. It would be interesting to have referrer statistics. I'm guessing that many of Wikipedia pages are being referred by

Re: [Wiki-research-l] Documenting WMF-related data souces: some questions to help me do it better.

2013-02-27 Thread Federico Leva (Nemo)
Maria Miteva, 27/02/2013 16:30: Hi everyone, I am working on creating a single entry page describing all the data about Wikipedia and WMF projects available for researchers. The idea is to have a single location, [...] So it will be integrated with

Re: [Wiki-research-l] Inventory of articles with video?

2013-03-21 Thread Federico Leva (Nemo)
Andrew Lih, 22/01/2013 20:54: Laura, thanks for your insight into this. I also worried about the generic ogg container and not knowing exactly whether it was audio or video, without digging deeper into the metadata. Now that we have WebM, how much do things change? An interesting thing IMHO is

Re: [Wiki-research-l] Any tools available to query a presence of a (digital humanities) website across language versions of Wikipedia

2013-07-12 Thread Federico Leva (Nemo)
Han-Teng Liao, 12/07/2013 11:14: Dear all, I wonder if there are any tools available to query a presence of a (digital humanities) website across language versions of Wikipedia? Digital humanities websites can be seen as more willingly digital GLAM institutions, and thus the need for

Re: [Wiki-research-l] Readable characters vs. size in bytes of articles

2013-08-06 Thread Federico Leva (Nemo)
Ziko van Dijk, 06/08/2013 02:12: Hello, When in 2008 I made some observations on language versions, it struck me that in some cases the wikisyntax and the meta article information was more KB than the whole encyclopedic content of an article. For example, the wikicode of the article Berlin in

Re: [Wiki-research-l] I WikiTrust no more

2013-09-04 Thread Federico Leva (Nemo)
Luca de Alfaro, 04/09/2013 19:27: Apologies for the pun title, but the message is this: I will no longer be maintaining the WikiTrust installation at UCSC. Thanks for your continued support of the tool till now! If anyone is interested in the data provided by WikiTrust, I would be happy to

[Wiki-research-l] Some data from the Italian Wikipedia deletion process

2013-10-03 Thread Federico Leva (Nemo)
Not too long ago (May 2011), the Italian Wikipedia switched from its traditional vote-based deletion process to an open-ended discussion process (so called consensus) similar to en.wiki's (lege: chaos) Some data is available about the effects and just waits to be analysed:

Re: [Wiki-research-l] How to unsubscribe

2013-11-09 Thread Federico Leva (Nemo)
Ankur Padia, 09/11/2013 06:42: Hello everyone, Could any one let me know how to unsubscribe from the mailing list for wiki-research ? Thanks in advance. https://lists.wikimedia.org/mailman/options/wiki-research-l ___ Wiki-research-l mailing

Re: [Wiki-research-l] data about failed searches

2013-12-19 Thread Federico Leva (Nemo)
Gerard Meijssen, 19/12/2013 12:06: Hoi, Sorry .. the link [1] and the blog post [2] I wrote when I learned about it. Thanks, Gerard [1] https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_redlinks [2] http://ultimategerardm.blogspot.nl/2013/11/a-brilliant-idea-barnstar.html Ah.

Re: [Wiki-research-l] Biases in Wikipedia Coverage of Academics

2014-01-25 Thread Federico Leva (Nemo)
Taha Yasseri, 24/01/2014 23:48: And some of you had made great suggestions to improve the article, specially Paolo Massa, whom I thank again here. Further comments and feedbacks are more than welcome. I had already placed some on

Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

2014-02-06 Thread Federico Leva (Nemo)
Sort of related, an ongoing education@ discussion student evaluation criteria. http://thread.gmane.org/gmane.org.wikimedia.education/854 Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org

Re: [Wiki-research-l] The role of English Wikipedia's top content creators in perpetuating gender bias

2014-02-17 Thread Federico Leva (Nemo)
Given that the discussion is all happening here anyway, I'll copy the comment I left on the blog. :) Nemo February 16, 2014 at 2:43 pm Your comment is awaiting moderation. Interesting trivia to munch, but little nutritional value IMHO. Everyone in every field always complains

Re: [Wiki-research-l] The role of English Wikipedia's top content creators in perpetuating gender bias

2014-02-20 Thread Federico Leva (Nemo)
Stuart A. Yeates, 18/02/2014 01:48: What would be great would be a set survey of the top 5000 (i.e. the group that Laura is already working with) where they were asked basic questions about the fields they edited in and their perception of gender bias, then half way through they were presented

Re: [Wiki-research-l] The role of English Wikipedia's top content creators in perpetuating gender bias

2014-02-22 Thread Federico Leva (Nemo)
Jane Darnell, 22/02/2014 23:23: [...]he amount of art in the museum is overwhelmingly Italian, Dutch/Netherlandish, and French [...] The horror! Those Italians, Dutch and French should really be ashamed of all the unjust advantage they amassed in centuries of abusive domination of the

Re: [Wiki-research-l] The role of English Wikipedia's top content creators in perpetuating gender bias

2014-02-23 Thread Federico Leva (Nemo)
Jane Darnell, 23/02/2014 10:37: Men, when perceiving anti-male behavior tend to do the opposite, namely they become aggressive and stand their ground. True, and in laughable ways even. Is such an attitude, however, so peculiarly true of this male label, which seems so secondary and useless?

Re: [Wiki-research-l] identifying Wikipedia article topics

2014-03-17 Thread Federico Leva (Nemo)
Amir E. Aharoni, 17/03/2014 16:21: Is there any known easy way to classify Wikipedia articles into a relatively small number of types? By relatively small I mean no more than twenty, and by types I mean things that are intuitively clear to readers [...] Your examples don't really seem topics

Re: [Wiki-research-l] Collecting Keywords in wikipedia page(s)

2014-03-24 Thread Federico Leva (Nemo)
Dr. Nick Lee, 24/03/2014 08:29: Does any of you have suggestions for a tool to use to collect keywords in Wikipedia pages? Any suggestion will be very very very much appreciated. Just a few days ago: http://lists.wikimedia.org/pipermail/wiki-research-l/2014-March/003339.html. Does it help?

Re: [Wiki-research-l] Fwd: Delivery Status Notification (Failure)

2014-04-27 Thread Federico Leva (Nemo)
h, 28/04/2014 03:36: Is it just me or something went wrong with http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm ? From my browser, the fields of country names, population and internet users are all missing from the report. I tried to send an email to the email address

Re: [Wiki-research-l] Wikipedia traffic: selected language versions

2014-05-17 Thread Federico Leva (Nemo)
h, 17/05/2014 01:54: Thus, we might want to share what has been done +1, but: and what could be done regarding the current traffic data provided by the Wikimedia Foundation while acknowledging the sensitivity of the traffic data release, what additional data would you need and why, given

Re: [Wiki-research-l] Wikipedia traffic: selected language versions

2014-05-18 Thread Federico Leva (Nemo)
Thanks for your suggestions. Just some quick pointers below. h, 18/05/2014 08:26: (I-A). Tabulate the data points in absolute numbers first, not percentage numbers [...] (I-B). Include all language versions for the *editing traffic* report as well. [...] (I-C). Provide static data objects in

Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread Federico Leva (Nemo)
Brian Keegan, 18/05/2014 18:10: Is there a way to retrieve a canonical list of bots on enwiki or elsewhere? A Bots.csv list exists. https://meta.wikimedia.org/wiki/Wikistat_csv In general: please edit https://meta.wikimedia.org/wiki/Research:Identifying_bot_accounts Nemo

Re: [Wiki-research-l] Collated FlaggedRevs stats (over time)

2014-05-25 Thread Federico Leva (Nemo)
Sorry, remove the 'e' from ns: https://sq.wikipedia.org/wiki/Special:ValidationStatistics https://ru.wikipedia.org/wiki/Special:ValidationStatistics https://uk.wiktionary.org/wiki/Special:ValidationStatistics https://sr.wikinews.org/wiki/Special:ValidationStatistics

Re: [Wiki-research-l] Any studies on vandalism levels at Wikia?

2014-05-29 Thread Federico Leva (Nemo)
Piotr Konieczny, 29/05/2014 05:56: Wikia (the largest wiki farm?) appears to be drastically under-researched... Part of the reason may be that they don't offer regular data dumps. But WikiTeam has remedied and recovered dumps for most of their top 14k wikis (as well as all images):

Re: [Wiki-research-l] Any studies on vandalism levels at Wikia?

2014-05-29 Thread Federico Leva (Nemo)
Piotr Konieczny, 29/05/2014 12:22: That's intriguing, any idea why Wikia is being so unfriendly with that? Are they doing the usual corporation our data is ours/secrecy is good/we don't need your research as it may reveal things we don't want the world/competitors to know about shtick? Nothing

Re: [Wiki-research-l] Any studies on vandalism levels at Wikia?

2014-05-29 Thread Federico Leva (Nemo)
Benj. Mako Hill, 29/05/2014 18:27: Without question, the current dumps put together by WikiTeam are an awesome resource for folks wanting to do Wikia research. Thanks. I hope someone will use them. :-) That said, they are a strange sample and it's not clear how they are representative of

Re: [Wiki-research-l] Is 'Random article' statistically robust over what population?

2014-06-28 Thread Federico Leva (Nemo)
stuart yeates, 28/06/2014 05:24: Has anyone looked into this? https://bugzilla.wikimedia.org/show_bug.cgi?id=65366 was just fixed. Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org

Re: [Wiki-research-l] Constructing sensible baselines for Wikipedia language development analytics

2014-07-09 Thread Federico Leva (Nemo)
h, 08/07/2014 13:49: This should also help sociolinguists to identify which languages [...] that are more developed than others in the Wikipedia sphere, and seeks explanations for their relative success/failure by contrasting the Wikipedia sphere and offline/online sphere. Agreed on the

Re: [Wiki-research-l] discussion about wikipedia surveys

2014-07-16 Thread Federico Leva (Nemo)
phoebe ayers, 16/07/2014 19:21: (Personally, I think the answer should be to resuscitate RCOM, but that's easy to say and harder to do!) IMHO in the meanwhile the most useful thing folks can do is subscribing to the feed of new research pages:

Re: [Wiki-research-l] Extracting PMIDs

2014-10-21 Thread Federico Leva (Nemo)
If missing, it would be best to (also) submit a mapping to DBpedia so that they extract PMIDs in next run. I only found something for ja.wiki: http://mappings.dbpedia.org/index.php/Mapping_ja:Infobox_Journal http://mappings.dbpedia.org/server/templatestatistics/en/?template=Vcite_journal (stats

Re: [Wiki-research-l] FW: Research discussion: Visions for Wikipedia

2014-10-29 Thread Federico Leva (Nemo)
Ziko van Dijk, 29/10/2014 01:09: could only be met with a broad social skills training. Difficult to implement, though.:-( Related: cMOOC, https://meta.wikimedia.org/wiki/Talk:Charting_diversity Nemo ___ Wiki-research-l mailing list

[Wiki-research-l] Drop in amount of wiki research?

2014-11-09 Thread Federico Leva (Nemo)
The data needs cleaning (and every small edit or redirect helps), but multiple sources agree on a trend similar to this, from 2011 to 2014 (partial): 943, 778, 489, 250. http://wikipapers.referata.com/wiki/2011 http://wikipapers.referata.com/wiki/2012 http://wikipapers.referata.com/wiki/2013

Re: [Wiki-research-l] Adding or submitting one's research project about Wikipedia to Wikipedia's Research project page

2014-11-09 Thread Federico Leva (Nemo)
Wikipedia's Research project page doesn't exist, I assume we're talking of the Wikimedia Meta-Wiki Research namespace. https://meta.wikimedia.org/wiki/Research Xiangju Qin, 07/11/2014 23:39: I emailed my advisor about this. He said that he didn't understand the implications of adding one's

Re: [Wiki-research-l] List of student research assignment

2014-11-10 Thread Federico Leva (Nemo)
John Andersson, 10/11/2014 20:48: we have recently started providing them with research assignment for students' thesis work (to work on for either during 10 weeks (Bachelor) or 20 weeks (Master)). As a small pilot I gave a presentation about this a couple of weeks ago and we have seen a great

Re: [Wiki-research-l] Drop in amount of wiki research?

2014-11-14 Thread Federico Leva (Nemo)
Piotr Konieczny, 14/11/2014 09:24: How complete is wikipapers referata site? Rather complete. There are some duplicates but we definitely have the majority of the publications (perhaps 90 %?) known to most sources. Could it be the case of lack of updates/maintenance/editor activity

Re: [Wiki-research-l] commentary on Wikipedia's community behaviour (Aaron gets a quote)

2014-12-12 Thread Federico Leva (Nemo)
1000th addition to the inconsequential rant genre. Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wiki-research-l] Pageviews, mobile versus desktop

2014-12-15 Thread Federico Leva (Nemo)
Oliver Keyes, 13/12/2014 21:15: http://ironholds.org/misc/pageviews_year_and_week.png - fascinating! It reveals a lot of seasonality in the desktop views - again, not replicated on mobile (at least, not so strongly) Does this graph also go from 2013-02-01 to 2014-12-01? Nemo

Re: [Wiki-research-l] Fwd: $55 million raised in 2014

2015-01-02 Thread Federico Leva (Nemo)
Denny Vrandečić, 02/01/2015 21:17: I have one or two ideas about what to do with 3 billion US Dollar. It would be a huge step towards some of my stretch goals for the movement. :) 3 billions are not that much money, for instance they're only enough to pay 6 months of operating costs of a poor

Re: [Wiki-research-l] a cautious note on gender stats Re: Fwd: [Gendergap] Wikipedia readers

2015-02-16 Thread Federico Leva (Nemo)
aaron shaw, 17/02/2015 05:50: If we want to have a more precise sense of the demographics of participants the biggest need in this space is simply higher quality survey data. My paper with Mako has a lot of detail about why the 2008 editor survey (and all subsequent editor surveys, to my

[Wiki-research-l] Fwd: Reasons you use the XML dumps or want to, but can't?

2015-02-20 Thread Federico Leva (Nemo)
FYI Messaggio inoltrato Oggetto:[Xmldatadumps-l] Your comments needed (long term dumps rewrite?) Data: Thu, 19 Feb 2015 12:30:01 +0200 Mittente: Ariel Glenn WMF ar...@wikimedia.org A: xmldatadump...@lists.wikimedia.org The MediaWiki Core team has opened

Re: [Wiki-research-l] [Analytics] [Release]

2015-02-25 Thread Federico Leva (Nemo)
Erik Zachte, 25/02/2015 23:34: Compare https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ and http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm Ironholds' looks more vulnerable to bots, it's easier to see in small wikis (though, kudos! many more

Re: [Wiki-research-l] Wikilink referral statistics

2015-04-29 Thread Federico Leva (Nemo)
This is a popular topic in this period. * https://meta.wikimedia.org/wiki/Research:Improving_link_coverage * https://meta.wikimedia.org/wiki/Research:Hovercards * https://meta.wikimedia.org/wiki/Research:Increasing_article_coverage Nemo ___

Re: [Wiki-research-l] Waray-Waray language Wikipedia

2015-05-01 Thread Federico Leva (Nemo)
Pine W, 01/05/2015 10:44: Hi researchers, One would think that you've learnt using WikiStats by now for trivial questions. * https://stats.wikimedia.org/EN/TablesWikipediaWAR.htm#bots * https://stats.wikimedia.org/EN/BotActivityMatrixCreates.htm Nemo

Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-07 Thread Federico Leva (Nemo)
Thanks for looking into www.wikipedia.org traffic from India; I've been complaining about it for a while. :) See also: * https://phabricator.wikimedia.org/T26767 * https://phabricator.wikimedia.org/T5665 Mark J. Nelson, 07/05/2015 04:24: But for the average Copenhagener, the following order is

Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-07 Thread Federico Leva (Nemo)
Scott Hale, 07/05/2015 09:51: The accept-language header is the obvious place to start, but there is amble scope to combine multiple approaches together. Which is what UniversalLanguageSelector / jquery.uls, used on all Wikimedia projects, exists for. :) In addition to accept-language and

Re: [Wiki-research-l] How to explain drop in random searches

2015-05-12 Thread Federico Leva (Nemo)
Alex Druk, 12/05/2015 07:56: Going from 86,000,000 a month to 31,000 a month is quite a drop, and the shift is pretty dramatic. It goes from 1.7 million one day to 715 the next and stays flat (http://stats.grok.se/en/201410/Special:Random). That's expected. The new data excludes redirecting

Re: [Wiki-research-l] stream.wikimedia.org

2015-04-02 Thread Federico Leva (Nemo)
Ed Summers, 02/04/2015 22:30: I was wondering if anyone had an example of using stream.wikimedia.org handy? I listed the examples I know of (from a previous thread) at https://wikitech.wikimedia.org/wiki/Stream.wikimedia.org#Clients_and_alternative_access_points Nemo

Re: [Wiki-research-l] Community health (retitled thread)

2015-06-04 Thread Federico Leva (Nemo)
Juergen Fenn, 04/06/2015 16:50: Reduced traffic on Wikimedia-l is mostly due to list moderation. That's plausible. Most people on wikimedia-l are moderated by now; I and others unsubscribed due to tyrannical moderation, too. Nemo ___

Re: [Wiki-research-l] Tracking authorship of wiki content

2015-08-22 Thread Federico Leva (Nemo)
Luca de Alfaro, 22/08/2015 01:51: So I got inspired, and I cleaned up some code that Michael Shavlovsky and I had written for this: https://github.com/lucadealfaro/authorship-tracking Great! It's always good when code behind a paper is published, it's never too late. If you can please add a

Re: [Wiki-research-l] citations to articles cited on wikipedia?

2015-08-21 Thread Federico Leva (Nemo)
Andrew Gray, 20/08/2015 14:21: They worked on a journal basis, classing them as OA or not OA. Weird, why didn't they just use DOAJ? https://doaj.org/ Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org

Re: [Wiki-research-l] Has the recent increase in English wikipedia's core community gone beyond a statistical blip?

2015-08-23 Thread Federico Leva (Nemo)
WereSpielChequers, 15/08/2015 15:12: With 8% more editors contributing over 100 edits in June 2015 than in June 2014 https://stats.wikimedia.org/EN/TablesWikipediaEN.htm, we have now had six consecutive months where this particular metric of the core community is looking positive. I'm not sure

Re: [Wiki-research-l] Looking for comparisons of the goals of Wikimedia and other organisations

2015-08-20 Thread Federico Leva (Nemo)
john cummings, 20/08/2015 10:30: I'm currently working as Wikimedian in Residence at UNESCO, does anyone know of any work done to compare the goals of Wikimedia with other organisations who work in education? What part of Wikimedia's activities do you classify as work in education? Nemo

Re: [Wiki-research-l] Wikipedia live monitor for identifying breaking news on Wikipedia

2015-07-19 Thread Federico Leva (Nemo)
Scott Hale, 19/07/2015 11:30: what was popular in different language editions Do you know https://tools.wmflabs.org/wikitrends/ ? Probably collaboration is accepted. as a way to possibly increase multilingual editing/consumption. Generally, a moment of high popularity of a subject (and

Re: [Wiki-research-l] Any Norwegian academics writing about Wikipedia?

2015-10-22 Thread Federico Leva (Nemo)
Laura Hale, 22/10/2015 11:16: I was wondering if any one on the list had any contacts with Norwegian academics doing research on Wikipedia, particularly from a gender gap perspective? http://wikipapers.referata.com/w/index.php?title=Special%3ALinkSearch=*.no finds two authors, though they

Re: [Wiki-research-l] Download of pageviews dataset

2015-11-11 Thread Federico Leva (Nemo)
Cristian Consonni, 11/11/2015 15:09: I am working with a student on scientific citation on Wikipedia and, very simply put, we would like to use the pageview dataset to have a rough measure of how many times a paper was viewed thanks to Wikipedia.[*] The full dataset is, as of now, ~ 4.7TB in

Re: [Wiki-research-l] Looking for help finding tools to measure UNESCO project

2015-10-06 Thread Federico Leva (Nemo)
Amir E. Aharoni, 06/10/2015 15:12: This raises a wider question: What is the comfortable way to compare the coverage of a topic in different languages? https://tools.wmflabs.org/mix-n-match/ . Example: https://tools.wmflabs.org/mix-n-match/?mode=sitestats=17 Nemo

Re: [Wiki-research-l] Has the recent increase in English wikipedia's core community gone beyond a statistical blip?

2015-08-29 Thread Federico Leva (Nemo)
Pine W, 29/08/2015 01:00: By the way, is there an easy way to get info on from https://stats.wikimedia.org about editor activity levels that excludes bots? They all exclude bots unless otherwise specified, see docs. Nemo ___ Wiki-research-l mailing

Re: [Wiki-research-l] Has the recent increase in English wikipedia's core community gone beyond a statistical blip?

2015-08-25 Thread Federico Leva (Nemo)
Kerry Raymond, 25/08/2015 02:57: It would be interesting to have some coarse characterisation of edits to see if any growth in edit count is spread uniformly against all contribution types or if the growth is disproportionate some way

Re: [Wiki-research-l] Editorial Bias in Crowd-Sourced Political Information

2015-09-03 Thread Federico Leva (Nemo)
Andrew Lih, 03/09/2015 19:12: four randomized field experiments At least, for once, this is not an euphemism of "random acts of vandalism": they say «one true positive and one true negative fact from a reputable news source on each senator that was not currently mentioned on that senator’s

Re: [Wiki-research-l] Editor Activity Analysis & Graphs

2015-09-18 Thread Federico Leva (Nemo)
jeph, 18/09/2015 06:37: I put together a presentation for the research team yesterday. Sharing it here, http://slides.com/cosmiclattes/edit-activity-graphs-analysis/. Is there a downloadable PDF on Commons? Thanks, Nemo ___ Wiki-research-l

Re: [Wiki-research-l] Community health statistics of Wikiprojects

2016-01-07 Thread Federico Leva (Nemo)
Jonathan Cardy, 08/01/2016 06:45: If I were trying to judge the health of a wikiproject in terms of whether they are a good thing to direct newbies to I would be more interested in questions such as: How many active editors are watchlisting that wikiproject? action=info now gives a better

Re: [Wiki-research-l] "Quick" request

2016-02-22 Thread Federico Leva (Nemo)
Bruno Goncalves, 22/02/2016 22:58: There used to be official HTML dumps https://dumps.wikimedia.org/other/static_html_dumps/ but they haven't been updated in almost a decade :) The job is effectively done by Kiwix now. http://download.kiwix.org/zim/wikipedia/ For instance:

Re: [Wiki-research-l] citing female academics

2016-02-28 Thread Federico Leva (Nemo)
Stuart A. Yeates, 28/02/2016 18:10: Finding relable sources on this facet of private people is very, very hard Why even bother publishing original research? Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org

Re: [Wiki-research-l] citing female academics

2016-02-28 Thread Federico Leva (Nemo)
Stuart A. Yeates, 28/02/2016 20:04: Data has been sucked from GND to wikidata via a number of routes, principally VIAF. See Wikidata:Bot_requests#Import_GND_identifiers_from_VIAF_dump for example for a discussion of an instance of this. In

Re: [Wiki-research-l] Wiki-research-l Digest, Vol 127, Issue 18

2016-03-20 Thread Federico Leva (Nemo)
Alex Druk, 20/03/2016 07:50: Requests with "Special:Random" / total number of requests for each project. Note that such requests are no longer counted in the pageviews data because thery are not HTTP 200. https://meta.wikimedia.org/wiki/Research:Page_view Nemo P.s.:

Re: [Wiki-research-l] [Xmldatadumps-l] New mirror of 'other' datasets

2016-05-15 Thread Federico Leva (Nemo)
Ariel Glenn WMF, 04/05/2016 14:33: You can access it at http://wikimedia.crc.nd.edu/other/ so please do! Great news, especially because it's ten times faster than dumps.wikimedia.org! Finally, every time I need a dataset to quickly verify a sudden idea I have, the download becomes a matter

Re: [Wiki-research-l] pagecounts and stub-meta-history

2016-07-28 Thread Federico Leva (Nemo)
Definitely consider the redirect :) https://mako.cc/copyrighteous/consider-the-redirect Bruno Goncalves, 28/07/2016 22:00: I've been trying to match edit activity with pagecounts The first question is how much data you need. If a few months are enough,

Re: [Wiki-research-l] link trails in different languages

2016-07-31 Thread Federico Leva (Nemo)
(Context: https://www.mediawiki.org/wiki/Linktrail ) Amir E. Aharoni, 31/07/2016 08:58: In other languages they will be different. Note that the $linkTrail variable in each language's MessagesXx.php file needs to be checked, to know which strings actually display as linktrails. Nemo

Re: [Wiki-research-l] Research on automatically created articles

2016-08-13 Thread Federico Leva (Nemo)
It's worth noting that research exists which *actively* sought to change real-life behaviour of Wikipedia visitors, such as https://www.econstor.eu/handle/10419/127472 whose authors expanded articles about certain Spanish cities in order to make tourists visit those cities more. Nemo

Re: [Wiki-research-l] Multi year page views statistics

2016-07-11 Thread Federico Leva (Nemo)
Avner Kantor, 11/07/2016 13:43: Can it be done by https://tools.wmflabs.org/pageviews No. https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Updates_and_backfilling or any other tool? Sure. Preferably by using https://dumps.wikimedia.org/other/pagecounts-ez/ , but most people end

Re: [Wiki-research-l] Time Between edits - difference between RevisionID and {{NUMBEROFEDITS}}

2017-01-25 Thread Federico Leva (Nemo)
Statistics of total (content) edit rate are also available on WikiStats at https://stats.wikimedia.org/EN/TablesDatabaseEdits.htm etc. Edit and revert trends charts tend to be more useful: https://stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm WereSpielChequers, 25/01/2017 16:10: One

[Wiki-research-l] Fwd: Divide XML dumps by page.page_namespace (and figure out what to do with the "pages-articles" dump)

2017-01-17 Thread Federico Leva (Nemo)
Input requested: https://lists.wikimedia.org/pipermail/wikitech-l/2017-January/087393.html , https://phabricator.wikimedia.org/T99483 Personally I think that the main issue is the slowness of some of the tools people use (including dumps.wikimedia.org itself), so I tried to improve the docs

Re: [Wiki-research-l] index of current research on wikipedia?

2016-09-11 Thread Federico Leva (Nemo)
Guillaume Paumier, 10/09/2016 16:43: WikiPapers is the main wiki-based curation platform for wiki-related academic publications, but it's down at the moment: http://wikipapers.referata.com/ Up now. I thought https://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia#Peer_reviewed

Re: [Wiki-research-l] index of current research on wikipedia?

2016-09-11 Thread Federico Leva (Nemo)
Gerard Meijssen, 11/09/2016 09:42: I wonder if it would make sense to include the data of Wikipapers in Wikidata like any other Wiki so far. Anyone is free to attempt that if they bother. Personally I won't: Semantic MediaWiki is way easier for this sort of thing (e.g. Data Transfer allows

Re: [Wiki-research-l] Thinking big: scaling up Wikimedia's contributor population by two orders of magnitude

2016-08-28 Thread Federico Leva (Nemo)
Dario Taraborelli, 27/08/2016 22:49: How making the edit button 10x larger is not a solution to this problem is a topic I'll reserve to a separate thread. You might want to include screenshots of the popups which are currently run to point people to the edit button. Nemo

Re: [Wiki-research-l] Thinking big: scaling up Wikimedia's contributor population by two orders of magnitude

2016-08-27 Thread Federico Leva (Nemo)
Pine W, 27/08/2016 09:13: What would we need in order to stimulate and nourish this kind of growth? https://strategy.wikimedia.org/wiki/Proposal:Make_Wikimedia_projects_scale Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org

Re: [Wiki-research-l] Wiki-editors' activity

2016-10-17 Thread Federico Leva (Nemo)
Alex Yarovoy, 17/10/2016 18:58: Does Wikipedia stored any metadata, logs or anything useful to track ones activity? https://wikitech.wikimedia.org/wiki/Logs ? Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org

Re: [Wiki-research-l] Data on the lifespan of Wikipedia articles

2016-11-28 Thread Federico Leva (Nemo)
Stella Yu, 29/11/2016 07:00: Where could I find data on the lifespan of different types of Wikipedia articles? What do you mean by "lifespan"? Does http://wikipapers.referata.com/wiki/Revision_history help? Nemo ___ Wiki-research-l mailing list

Re: [Wiki-research-l] Year-by-year statistics for unregistered contributors (IPs)

2016-12-08 Thread Federico Leva (Nemo)
Ofer Arazy, 08/12/2016 07:50: Namely, I'm looking for stats regarding the count of these IP users at the end of each calendar year, as well as their activity levels (e.g. avg. monthly edits). Something like

Re: [Wiki-research-l] another pageview db to download

2016-12-11 Thread Federico Leva (Nemo)
Alex Druk, 12/12/2016 08:32: For a few years I have maintained a web site wikipediatrends.com . For variety of reasons I cannot do it any more and the site will be closed in January. However, our DB of English wikipedia pageviews from 2007 can be used for other

Re: [Wiki-research-l] Global footprint for carbon

2017-01-12 Thread Federico Leva (Nemo)
Gerard Meijssen, 12/01/2017 07:48: Has anyone ever calculated what the footprint of Wikipedia is in terms of the production of carbondioxide? WMF is no longer as transparent as it used to be about which servers are used etc., but someone tried some calculations:

Re: [Wiki-research-l] is there a way to get the list of edits containing specific tags in the summary

2017-03-02 Thread Federico Leva (Nemo)
Kerry Raymond, 03/03/2017 04:24: For example, can I get all the edits with 1Lib1Ref in them or some other tag to be used at an event next Monday? Since the question was already answered by Pru, I'll just provide an alternative experience: for in-person events, I prefer to let users follow

Re: [Wiki-research-l] surveying Wiki editors?

2017-03-02 Thread Federico Leva (Nemo)
Misha Teplitskiy, 02/03/2017 15:56: Does anyone have experience surveying Wikipedia editors? Can someone point me to literature that has done this or discuss how one might go about doing it? See https://meta.wikimedia.org/wiki/Editor_Survey Nemo

Re: [Wiki-research-l] Finding what is said about a topic in other articles

2017-06-14 Thread Federico Leva (Nemo)
Kerry Raymond, 14/06/2017 00:45: > What is motivating this > is because I often find that "what links here" often points to some > surprising articles which can reveal new insights into a topic. Indeed. I always teach the "what links here" feature at all my wiki courses. Kerry Raymond,

Re: [Wiki-research-l] [discovery] Discovery Weekly Update for the week starting 2017-09-18

2017-09-26 Thread Federico Leva (Nemo)
Chris Koerner, 25/09/2017 23:32: * Mikhail created a dashboard to track the prevalence of sister project search results on fulltext search result pages on desktop, broken up by language. For example, it turns out that nearly 80% of fulltext searches show sister projects on enwiki. [30] [30]

Re: [Wiki-research-l] Digital Infrastructure Research RFP

2018-05-09 Thread Federico Leva (Nemo)
Leila Zia, 09/05/2018 21:22: The Sloan and Ford Foundations have made a request for proposals with the goal of funding a set of research projects to further our understanding of economics, maintenance, and sustainability of digital infrastructures (especially as they rely heavily on volunteer

Re: [Wiki-research-l] Terms of use for wmf dump metadata?

2018-05-16 Thread Federico Leva (Nemo)
Edward L Platt, 16/05/2018 18:57: I can find information on the copyright/terms-of-use for text and image data, but nothing explicit about the metadata. Which metadata are you talking about? The copyright license applies to the whole XML text. Federico

Re: [Wiki-research-l] Wiki Dialogue Systems

2018-05-21 Thread Federico Leva (Nemo)
Adam Sobieski, 21/05/2018 14:07: WIKI DIALOGUE SYSTEMS Exploration into the collaborative authoring and debugging of dialogue systems could result in new wiki technologies. Wiki dialogue systems could resemble spoken language dialogue systems with transcript-based user interfaces, users able

Re: [Wiki-research-l] Reader use of Wikipedia and Commons categories

2018-05-24 Thread Federico Leva (Nemo)
Ziko van Dijk, 24/05/2018 23:08: When it comes to Commons, I would be very interested to learn how many readers (or recipients) are actually non Wikipedia editors. It would be useful to consider less common but high value usage, for instance people looking for illustrations for a publication.

Re: [Wiki-research-l] Terms of use for wmf dump metadata?

2018-05-16 Thread Federico Leva (Nemo)
Edward L Platt, 16/05/2018 23:16: The derivatives in this case are coeditor networks for each WikiProject, based on which editors have edited the same articles. Is this something you produce yourself? I cannot find such a dataset in . Are you in EU?

  1   2   >