Re: [Analytics] stats.grok.se questions

2014-04-27 Thread Federico Leva (Nemo)
All your questions are answered at the FAQ. http://en.wikipedia.org/wiki/User:Killiondude/stats Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] [Multimedia] [x-post from Analytics] Survey tool for features

2014-05-01 Thread Federico Leva (Nemo)
Andrew Gray, 30/04/2014 21:47: It's certainly simpler to implement, but anything that involves on-wiki recording has two main problems: * friction in saving the entry (eg edit conflicts, login required, user IP blocked) Edit conflicts are supposed to be avoided, when appending text to a page

Re: [Analytics] Please help maintain our dashboard directory

2014-05-21 Thread Federico Leva (Nemo)
Erik Moeller, 21/05/2014 08:21: The only reference dashboard directory we have right now, AFAIK, is: https://meta.wikimedia.org/wiki/Research:Data/Dashboards I maintain a list at https://meta.wikimedia.org/wiki/Statistics There's a bunch of Limn dashboards missing from this list, and there

Re: [Analytics] [wmfresearch] Want to examine editors cross-wiki activities, have a table.

2014-06-12 Thread Federico Leva (Nemo)
Pine W, 13/06/2014 05:27: Interesting! Is there a way that I can use this with metrics.wikimedia.org http://metrics.wikimedia.org to perform cross-wiki cohort analysis, or do I need access to analytics-store.eqiad.wmnet?

Re: [Analytics] Fwd: ** PROBLEM alert - tungsten/Throughput of event logging events is CRITICAL **

2014-06-25 Thread Federico Leva (Nemo)
Nuria Ruiz, 25/06/2014 09:48: CRITICAL: 7.14% of data exceeded the critical threshold [500.0] Love, Icinga Cute! Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] datasets.wikimedia.org

2014-09-12 Thread Federico Leva (Nemo)
Toby Negrin, 07/08/2014 08:38: Hi Andrew -- should we redirect / to /public-datasets? There's no content there, just a default web page which is a bit confusing. Also, why can't all this be in the standard dumps.wikimedia.org domain? Nemo ___

[Analytics] Special:Statistics patch, anyone interested?

2014-09-28 Thread Federico Leva (Nemo)
https://gerrit.wikimedia.org/r/#/c/147505/ If someone could review this MediaWiki patch, I'd be most grateful. It's so frustrating when users, journalists and everyone mention Special:Statistics numbers and get surprised at the existence of WikiStats. Nemo

Re: [Analytics] Analytics services reliability question

2014-10-13 Thread Federico Leva (Nemo)
Christian Aistleitner, 13/10/2014 11:49: Hence, more things get brought to the list than would surface on targeted lists or for closed down shops. +1 I think recently information quality on this list has increased a lot. If one looks only at public mailing lists archives (as opposed to

Re: [Analytics] Mobile page views available

2014-10-15 Thread Federico Leva (Nemo)
Denny Vrandečić, 15/10/2014 22:08: Just a quick question - does mobile in this case mean a) views through Tablets or Phones using a browser? b) views through the Wikipedia App? Neither. Just the mobile site, quoting: «And de.m.voy Berlin 176 314159 would stand for 176 requests to

Re: [Analytics] Wikimedia Research showcase – October 15 2014, 11.30 PT

2014-10-16 Thread Federico Leva (Nemo)
Aaron Halfaker, 16/10/2014 19:40: I'd argue that the continued persistence of the problems surrounding the impersonal negative reception of newcomers is the result of a lack of adaptability. Policy calcification (if that's an appropriate term for the observed trends) is one bit of evidence of

Re: [Analytics] [LangEng] Analytics

2014-11-17 Thread Federico Leva (Nemo)
Nuria Ruiz, 17/11/2014 19:53: I think there is a confusion between the not so well named beta environment (testing environment in labs, which is what our thread refers to) and being a beta-feature. AKA https://bugzilla.wikimedia.org/show_bug.cgi?id=56537 Nemo

Re: [Analytics] Virtual file view hack for Media Viewer views

2015-02-05 Thread Federico Leva (Nemo)
Gergo Tisza, 04/02/2015 21:00: Do you see any fundamental problem with this? A dummy image request seems rather reasonable. (I assume varnish can handle such load of atypical requests.) Making additional requests is ugly, but until we get SPDY our articles typically make dozens or hundreds

Re: [Analytics] Relevant Content Availability

2015-01-21 Thread Federico Leva (Nemo)
If you just need ballpark numbers, the proposed approach might work. If you want to produce something concretely usable, it's going to be much more complex: https://meta.wikimedia.org/wiki/Research:Measuring_mission_success In particular, 100k is a ridiculous number and restricting yourself

Re: [Analytics] Fwd: Calculating interlinks between Wikipedias

2015-01-18 Thread Federico Leva (Nemo)
Nice, is there a higher resolution version of the image? I'm having difficulties reading it. Neta Livneh, 18/01/2015 18:53: 2. There is a group of interconnected wikis that are based on Swedish (Dutch, Waray-Waray, Cebuano, Vietnamese, Indonesian, Minangkabau). Looks like a list of Lsjbot

Re: [Analytics] Fwd: Calculating interlinks between Wikipedias

2015-01-18 Thread Federico Leva (Nemo)
Neta Livneh, 18/01/2015 19:57: I think this is a better version. Thanks. I think the way to read this graph is that it's naturally darker below the diagonal line, and fairer above it. In fact, position (x, y) is the percentage of articles in wiki x which also exist in wiki y. If y x we

Re: [Analytics] Per-namespace daily edit numbers

2015-01-07 Thread Federico Leva (Nemo)
Gergo Tisza, 08/01/2015 02:52: Even better if it can be filtered by the editcount of the user at the time of the edit. Then you probably want something like https://stats.wikimedia.org/EN/TablesWikipediaHU.htm#editor_activity_levels but with File namespace disaggregated from Other. Nemo

Re: [Analytics] Relevant Content Availability

2015-03-17 Thread Federico Leva (Nemo)
Abdel Samad, Rawia, 21/01/2015 09:47: I work for a consulting firm called Strategy. We have been engaged by Facebook on behalf of Internet.org to conduct a study on assessing the state of connectivity globally. One key area of focus is the availability of relevant online content. We are using a

Re: [Analytics] US Gov released request log datasets today

2015-03-19 Thread Federico Leva (Nemo)
Jeremy Baron, 19/03/2015 20:59: I wonder what their pageview definition is.:) Whatever Google Analytics uses, it seems? :/ https://github.com/18F/analytics-reporter Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-07 Thread Federico Leva (Nemo)
Christian Aistleitner, 07/03/2015 15:14: P.S.: The above URL has diagrams! Click the URL! And with colours! So it's like checking heartbeats, cute. :) Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] [Announce] New daily feed: media file request counts

2015-03-25 Thread Federico Leva (Nemo)
Hay (Husky), 25/03/2015 11:03: Answering my own question: until somebody puts up a stats.grok.se-like interface for the mediacounts, i've hacked together a Python script that can be used to 'query' the TSV files with a file, or a list of files:

Re: [Analytics] Odd data in dumps

2015-03-01 Thread Federico Leva (Nemo)
Getting aggregation right is hard. :) The only wiki I know whose counts are surely wrong is wmfwiki: https://phabricator.wikimedia.org/T51266 Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

[Analytics] Fwd: Reasons you use the XML dumps or want to, but can't?

2015-02-20 Thread Federico Leva (Nemo)
FYI Messaggio inoltrato Oggetto:[Xmldatadumps-l] Your comments needed (long term dumps rewrite?) Data: Thu, 19 Feb 2015 12:30:01 +0200 Mittente: Ariel Glenn WMF ar...@wikimedia.org A: xmldatadump...@lists.wikimedia.org The MediaWiki Core team has opened

Re: [Analytics] [Wiki-research-l] [Release]

2015-02-25 Thread Federico Leva (Nemo)
Erik Zachte, 25/02/2015 23:34: Compare https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ and http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm Ironholds' looks more vulnerable to bots, it's easier to see in small wikis (though, kudos! many more

Re: [Analytics] article creation stuck in February

2015-04-30 Thread Federico Leva (Nemo)
Amir E. Aharoni, 30/04/2015 14:37: The article creation tables were last updated for February: http://stats.wikimedia.org/EN/TablesArticlesNewPerDay.htm Apparently it's stuck on the lack of a fa.wiki dump: * https://stats.wikimedia.org/WikiCountsJobProgress.html *

Re: [Analytics] rsync from stat1002 broken

2015-06-22 Thread Federico Leva (Nemo)
Oliver Keyes, 22/06/2015 22:51: search dashboards (Presumed context: http://searchdata.wmflabs.org/ . Thanks [[m:Statistics]].) Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] [Wikimedia-l] Wikipedia article per speaker

2015-06-13 Thread Federico Leva (Nemo)
Asaf Bartov, 13/06/2015 02:42: The (already existing) metric of active-editors-per-million-speakers is, it seems to me, a far more robust metric. Erik Z.'s stats.wikimedia.org http://stats.wikimedia.org is offering that metric. I personally agree on this in general, but Millosh is trying

Re: [Analytics] deletion of newly created articles

2015-05-31 Thread Federico Leva (Nemo)
You can presumably replicate https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation for other languages with the code provided, if you're interested in more than the 8 or so biggest Wikipedias. Or you can ask the speedy deletion Wikias:

Re: [Analytics] Request for three viewership statistics

2015-07-07 Thread Federico Leva (Nemo)
Pine W, 07/07/2015 02:29: (2) During the past 90 days or so, how many unique users have viewed https://en.wikipedia.org/wiki/File:Cascadiawikimedians_transparent_Gill_Sans_155px_high.png on the various Wikimedia pages where it's included? (2) During the past 90 days or so, how many times has

Re: [Analytics] pageviews_hourly table

2015-08-23 Thread Federico Leva (Nemo)
Tilman Bayer, 22/08/2015 19:33: And I know that other issues were caught by ErikZ's proactive vigilance, which will need to find an equivalent in the upcoming replacement for Wikistats. +1 Nemo ___ Analytics mailing list

Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone

2015-10-27 Thread Federico Leva (Nemo)
Jonathan Morgan, 27/10/2015 18:53: Either way, it's safe to say that the total number is in the millions. +1. It's correct to say that millions have edited Wikipedia, and probably editors for Wikimedia projects are in the order of 10^7. There is no information gain in trying to give more

Re: [Analytics] How is "article" defined in Special:Statistics?

2015-10-28 Thread Federico Leva (Nemo)
https://www.mediawiki.org/wiki/Manual:Article_count Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

[Analytics] Does StackExchange have more monthly active users than Wikipedia?

2015-11-13 Thread Federico Leva (Nemo)
Some information at https://meta.stackexchange.com/questions/269334/how-many-active-users-contributors-does-stack-overflow-stack-exchange-have/ TL;DR: not really, and definitely not StackOverflow alone (~14k). But perhaps the whole StackExchange has more than the English Wikipedia alone.

Re: [Analytics] Does StackExchange have more monthly active users than Wikipedia?

2015-11-13 Thread Federico Leva (Nemo)
Timo Tijhof, 14/11/2015 01:38: StackOverflow's recent blog post about renaming their organisation does make an interesting claim though. https://blog.stackoverflow.com/2015/09/were-changing-our-name-back-to-stack-overflow/ > The [Stack Exchange] network as a whole has more monthly 5-time

Re: [Analytics] Special:Log/move and Special:NewPages

2015-10-30 Thread Federico Leva (Nemo)
https://meta.wikimedia.org/wiki/Article_counts_revisited ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Pagecounts strange record

2015-11-03 Thread Federico Leva (Nemo)
Giacomo Marangoni, 31/10/2015 17:55: Sometimes I found record like this “it.n Addio_al_regista_Sydney_Pollack 1 0” and I can’t explain myself how a page could be visited one time and turn back a response of 0 byte. Did you check whether the page existed at the time? pagecounts-raw also

Re: [Analytics] Vital Signs dashboard

2015-08-25 Thread Federico Leva (Nemo)
Neil P. Quinn, 25/08/2015 01:58: Sorry—it turns out that this is a browser bug! All of the graphs except legacy pageviews display no lines at all in Firefox (I've tested 42.0a2 and 41.0b3). I really should have checked that first. I'll file it in Phab; hopefully, you can take a look at some

[Analytics] Anyone using the user_daily_contribs table/API?

2015-09-05 Thread Federico Leva (Nemo)
See https://phabricator.wikimedia.org/T85984 The user_daily_contribs table (and associated API) is sometimes used for * JavaScript (e.g. CentralNotice) targeting users based on activity in a certain timeframe, * simplification of SQL queries (e.g. [1]), * other? If you use this data/feature

Re: [Analytics] corrupted and missing log files

2015-09-14 Thread Federico Leva (Nemo)
Users also keep a list in the stats.grok.se FAQ: https://en.wikipedia.org/wiki/User:Killiondude/stats#Are_there_known_dates_for_which_complete_sets_have_not_been_compiled_although_the_data_seems_to_be_available Nemo ___ Analytics mailing list

Re: [Analytics] Users changing language version through interwiki links

2015-09-12 Thread Federico Leva (Nemo)
Strainu, 12/09/2015 14:43: Would it be possible to track the number of users changing language version in each article? Like: on date X, Y users visited a.wikipedia.org and Z left to go to b.wikipedia.org, T left for c.wikipedia.org etc. Indeed! We've been promised this data release already at

Re: [Analytics] Confusing pageviews

2015-12-02 Thread Federico Leva (Nemo)
Oliver Keyes, 02/12/2015 18:52: Via Brian Davis we find out the responsible patch is https://github.com/wikimedia/analytics-refinery-source/commit/05e5da92553dbd3e691eb45d40e559895337935f Context:

Re: [Analytics] Preliminary goals for analytics infrastructure team

2015-12-03 Thread Federico Leva (Nemo)
Jon Katz, 03/12/2015 06:16: Pywik up and running for iOS (simple machine spin-up, if not finished in Q2) What is meant here by "Pywik"? Piwik? Some shortening of pywikibot? Other? Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] Backlinks TO Wikipedia

2015-12-01 Thread Federico Leva (Nemo)
Edison Nica, 29/11/2015 16:56: how many non-wikipedia pages point to a certain wikipedia page I guess the only way we have to know this (other than grepping request logs for referrers, which would be quite a nightmare) is to access the Google Webmaster account for wikipedia.org (to which a

Re: [Analytics] Inconsistent user IDs between EventLogging and main database

2015-12-16 Thread Federico Leva (Nemo)
Neil P. Quinn, 16/12/2015 21:40: Does anyone know what's going on? Is this issue documented anywhere? There is already at least one phabricator report IIRC. Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] Readership metrics for the fortnight until December 6, 2015

2015-12-14 Thread Federico Leva (Nemo)
Interesting country breakdown! Tilman Bayer, 14/12/2015 12:32: For the top three, I looked at how pageviews developed on a daily basis during the last three month including the week after this large change (until Dec 6): In Greece, the +21.6% rise was the result of an isolated spike from

Re: [Analytics] How many times has a video been played?

2015-12-15 Thread Federico Leva (Nemo)
Dan Andreescu, 15/12/2015 03:43: Or python if that's easier. https://github.com/hay/wiki-tools/blob/master/etc/mediacounts-stats.py is very easy to use. Download from dumps.wikimedia.org is tragically slow, making any one-time analysis impractical, but /data/scratch/tmp/mediacounts on Labs

Re: [Analytics] Data collection

2015-12-14 Thread Federico Leva (Nemo)
Erik Zachte, 14/12/2015 14:14: I can run similar reports for earlier months. Thanks for publishing that code too! https://github.com/wikimedia/analytics-wikistats/tree/master/dammit.lt/bash Nemo ___ Analytics mailing list

Re: [Analytics] timestamp wiki pagecounts

2015-12-24 Thread Federico Leva (Nemo)
Maurice Vergeer, 24/12/2015 10:16: I am looking at your pagecounts as archived on https://dumps.wikimedia.org/other/pagecounts-raw/2015/2015-12/ Can you tell me from what timezone the time stamps originate? Any and all timestamps in dumps.wikimedia.org are in UTC. Apparently this is not as

Re: [Analytics] Survey for Wikipedia readers

2016-05-30 Thread Federico Leva (Nemo)
Vipul Naik, 31/05/2016 02:51: Any feedback on the survey questions would also be appreciated, on- or off-thread. You should specify what the data will be used for. Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] Pageview stats tools

2016-01-31 Thread Federico Leva (Nemo)
Pine W, 31/01/2016 09:07: Apologizes if this information was already published and I missed it. https://phabricator.wikimedia.org/T120497 https://phabricator.wikimedia.org/T43327 Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] Multimedia data being crunched, expanded - first look

2016-01-25 Thread Federico Leva (Nemo)
Mark Holmquist, 25/01/2016 15:58: You can find the graphs here: https://edit-analysis.wmflabs.org/multimedia-health At the default size, I only see 2010 in the x axis. Is there a way to reduce scale without zooming out? Only at 33 % zoom I manage to see 2015, and that's not very readable.

Re: [Analytics] [Data Release] [Data Deprecation] [Analytics Dumps]

2016-03-23 Thread Federico Leva (Nemo)
Dan Andreescu, 23/03/2016 15:58: *Clean-up:* Analytics data on dumps was crammed into /other with unrelated datasets. We made a new page to receive current and future datasets [3] and linked to it from /other and /. Please let us know if anything there looks confusing or opaque and I'll be

Re: [Analytics] Dark traffic

2016-03-01 Thread Federico Leva (Nemo)
James Forrester, 01/03/2016 15:59: to be more of a "good citizen" of the Internet ...people should make their websites HTTPS. Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Data Request: How many % of new WP Users do submit an Email?

2016-04-13 Thread Federico Leva (Nemo)
When you say "users in the german speaking area", would you content yourself with newly created accounts on the German Wikipedia (and sister projects in German)? Nemo Nuria Ruiz, 13/04/2016 17:44: (cc-ing analytics@, our public list in case other contributors can chime in for ideas) >For

Re: [Analytics] Retrieving filenames for category

2016-05-20 Thread Federico Leva (Nemo)
Sander Ubink, 20/05/2016 14:30: We cannot find out how BaGLAMa collects the filenames for all files within a category. Usually

Re: [Analytics] [WikimediaMobile] "Among mobile sites, Wikipedia reigns in terms of popularity"

2016-05-11 Thread Federico Leva (Nemo)
Thanks; Nielsen data can indeed be very useful, I asked about it earlier because I'd love to have it again for Italy. https://meta.wikimedia.org/w/index.php?title=Talk:ComScore/Announcement=15227130 Nemo Tilman Bayer, 11/05/2016 19:23: New study (US only) by the Knight Foundation:

Re: [Analytics] Video view stats

2016-05-17 Thread Federico Leva (Nemo)
Itzik - Wikimedia Israel, 17/05/2016 11:29: do we have a tool to pull this numbers (for people without sql access)... Yes, the same as earlier: https://wikitech.wikimedia.org/wiki/Analytics/Data/Mediacounts#Clients Unless you mean "for people without command line access on their computer",

Re: [Analytics] Reports on Views and Edits per Country

2016-04-18 Thread Federico Leva (Nemo)
Andre Klapper, 18/04/2016 13:29: http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesVisitsEdits.htm says it's "Discontinued since June 2015" but does not tell me where to find recent view/edit reports per country. Anybody knows if recent data exists and where it's

Re: [Analytics] Ranking Wikimedia projects by sizes or activity levels

2016-07-12 Thread Federico Leva (Nemo)
http://wikistats.wmflabs.org/ has this on the main page. https://www.wikimedia.org/ is the easy way to remember the rank by "size", which as always is determined by how used they are. Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

[Analytics] Missing mediacounts for 2016-12-01

2017-02-16 Thread Federico Leva (Nemo)
As far as I can see, mediacounts.2016-12-01.v00.tsv.bz2 is missing: http://dumps.wikimedia.your.org/other/mediacounts/daily/2016/ https://dumps.wikimedia.org/other/mediacounts/daily/2016/ Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] stats.grok.se used in study about Snowden and internet traffic

2017-01-19 Thread Federico Leva (Nemo)
Dan Andreescu, 19/01/2017 23:42: there are no ways to know if data is missing or there are actual gaps. These were documented on the FAQ though, also based on Erik Zachte's analysis. Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] stats.grok.se used in study about Snowden and internet traffic

2017-01-19 Thread Federico Leva (Nemo)
Dan Andreescu, 19/01/2017 20:09: now that stats.grok is completely down. It's not, AFAICT: http://stats.grok.se/en/200712/Britney_Spears Only the new data is missing (since January 2016), as stated on the FAQ

Re: [Analytics] Glamorous & Massview report?

2017-02-26 Thread Federico Leva (Nemo)
Itzik - Wikimedia Israel, 26/02/2017 16:13: file from a specific commons category ("Wikimedia Israel - Channel 2 videos"). https://github.com/hay/wiki-tools/blob/master/etc/mediacounts-stats.py Nemo ___ Analytics mailing list

Re: [Analytics] Strange results from Wikistats

2017-02-26 Thread Federico Leva (Nemo)
Neil Patel Quinn, 22/02/2017 03:21: Any idea what's going on? See https://phabricator.wikimedia.org/T158500 Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Recent cross-language view stats

2016-09-05 Thread Federico Leva (Nemo)
Leon Ziemba, 05/09/2016 22:20: I'm not sure if this restriction actually helps with performance, maybe others could shed light on this? It's the usual problem with https://phabricator.wikimedia.org/T125345 Nemo ___ Analytics mailing list

Re: [Analytics] Do most of the articles really receive little to no edits?

2016-09-07 Thread Federico Leva (Nemo)
Reem Al-Kashif, 07/09/2016 15:52: I always hear people saying that most of the articles usually receive little to no edits Do you mean that many articles * have not been edited in a long time (6+ months?), * have few revisions (that is?), or * have only a human editor or two? (and that is

Re: [Analytics] Parsing user agents in EventLogging data

2016-09-14 Thread Federico Leva (Nemo)
Tilman Bayer, 15/09/2016 01:21: This came up recently with the Reading web team, for the purpose of investigating whether certain issues are caused by certain browsers only. But I imagine it has arisen in other places as well. Definitely.

Re: [Analytics] Analysing link

2016-08-26 Thread Federico Leva (Nemo)
Jan Dittrich, 26/08/2016 10:03: or even click paths Do you know about https://meta.wikimedia.org/wiki/Research:Improving_link_coverage/Release_page_traces ? Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] Where is the 2010 survey?

2016-09-27 Thread Federico Leva (Nemo)
Reem Al-Kashif, 27/09/2016 19:46: I was wondering where the famous (or rather infamous) 2010 survey is? This one was made by the WMF and showed that women made less than 13% of WP contributors (mentioned here ). The category at

Re: [Analytics] ensuring reader anonymity

2016-11-11 Thread Federico Leva (Nemo)
Dan Andreescu, 10/11/2016 16:00: I don't have as clear a reason for why we store the plain IP in webrequest. I think we could count uniques and all that other stuff with the IP hash. It's a good question, tentative +1 unless I'm forgetting something. I support any decrease of the storage of

Re: [Analytics] Identifying bots and bot edit decline

2016-10-11 Thread Federico Leva (Nemo)
Wikistats knows about 8017 bot usernames according to https://dumps.wikimedia.org/other/pagecounts-ez/wikistats/csv_wp_main.zip (cut -f2 -d, StatisticsBots.csv | sort -u | wc -l ). Given active editors tend to complain a lot if they get counted as bots, a comprehensive list should probably be

Re: [Analytics] [Reminder] eventlogging mysql/analytics stores maintenance

2016-12-09 Thread Federico Leva (Nemo)
This will happen today For the archives' sake, I believe "this" stands for what announced in https://lists.wikimedia.org/pipermail/analytics/2016-December/005593.html Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] On Wikipedia edits archive per county.

2017-01-02 Thread Federico Leva (Nemo)
Again, we are exclusively looking for the absolute number of Wikipedia updates per year per county. https://stats.wikimedia.org/wikimedia/squids/ has data on 23 quarters. To get the absolute number, you can multiply the percentage by the totals at

Re: [Analytics] On Wikipedia edits archive per county.

2017-01-02 Thread Federico Leva (Nemo)
Probably someone at WMF would be more appropriate for a call; I can only share information which is online. Rafael Escalona Reynoso, 02/01/2017 17:40: Thank you for your reply. While reviewing both tables, the process to define the total number of edits per country is still a bit ambiguous.

Re: [Analytics] Page view statistics for Wikimedia projects - time series resolution

2016-12-21 Thread Federico Leva (Nemo)
Laurentiu Checiu, 21/12/2016 18:46: Would it be possible to find the above mentioned time series resolution at millisecond (ms) ? Definitely not with (current) public data, but if you use the currently maintained pageviews data you can have hourly resolution:

Re: [Analytics] Does Analytics do site traffic and SEO measurement kinds of things?

2016-12-23 Thread Federico Leva (Nemo)
The scarce traffic on Wikivoyage is not especially surprising: there is a lot of competition in this niche and Wikivoyage is not particularly different in content etc. from some of its competitors. The evidence accumulated in the last few years suggests that we should accept that Wikivoyage

Re: [Analytics] Top editors in a certain namespace across sites?

2017-03-22 Thread Federico Leva (Nemo)
Andre Klapper, 22/03/2017 13:51: Does anyone know of a way to look up the top editors for a certain namespace (like "Module") across all Wikimedia sites? The easiest way is usually to run the relevant SELECT queries with a small bash script on Labs or with sql.php on tin (e.g.

Re: [Analytics] Top editors in a certain namespace across sites?

2017-03-22 Thread Federico Leva (Nemo)
(I confirm my advice. I usually use Labs of course.) Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Fwd: follow-up on editors

2017-03-22 Thread Federico Leva (Nemo)
Aaron Halfaker, 22/03/2017 22:43: __· __Number of editors who contribute 1 edit per month? __ First column of https://stats.wikimedia.org/EN/TablesWikimediaAllProjects.htm . __· Is it possible/feasible to run editor retention metrics globally (versus just based on a single

Re: [Analytics] Drop in mainpage pageviews?

2017-07-15 Thread Federico Leva (Nemo)
Strainu, 15/07/2017 12:46: Starting from an unrelated discussion on meta, I noticed a significant drop in main page views for several wikis starting from April this year. Is there anything we (or Google) did at that time to justify this drop? Curiously, if you sum all languages the numbers are

Re: [Analytics] Google Code-in: Get your tasks for young contributors prepared!

2017-10-17 Thread Federico Leva (Nemo)
Lars Noodén, 17/10/2017 17:13: Ok. Is there a checklist of things to do that I may work on that task instead? In theory which was linked from via

Re: [Analytics] Google Code-in: Get your tasks for young contributors prepared!

2017-10-17 Thread Federico Leva (Nemo)
Lars Noodén, 17/10/2017 16:16: Would it be possible to add T144714, or something based on it, to the list? https://phabricator.wikimedia.org/T144714 No, because it would require the minor to sign an NDA. Nemo ___ Analytics mailing list

Re: [Analytics] Undocumented project code in pagecounts-ez

2017-11-14 Thread Federico Leva (Nemo)
Michael Baldwin, 14/11/2017 04:43: However, I've been coming across a large number of wiki codes "en.m". The "m" code is undocumented. It appears to be the mobile version of Wikipedia, but can anyone confirm that? Should the page be updated with this information? Historically we collect most

Re: [Analytics] Tool to visualize which wiki pages link to which wiki pages?

2017-11-21 Thread Federico Leva (Nemo)
Andre Klapper, 21/11/2017 17:15: I've been wondering if anyone's aware of any visualization tool that draws a graph showing which wiki pages are linked from which other wiki pages (up to a certain depth) The closest thing I can think of is Erik's chart of category links, generated with a

Re: [Analytics] pageviews before 2015

2018-06-12 Thread Federico Leva (Nemo)
Saqib Q, 11/06/2018 22:08: I need to get page views of some bios from 2013 to 2015. Can anyone help me ? the current page views stats is not of any help. Have you checked ? Federico

Re: [Analytics] pageviews before 2015

2018-06-12 Thread Federico Leva (Nemo)
Saqib Q, 12/06/2018 13:08: OK but how to do it? Do I need to install some application to extract the page views data of some particular pages ? Just using grep should suffice, to produce a CSV you can open with LibreOffice or any spreadsheet software. But yes, you need some basic command

Re: [Analytics] temporary drop in pageviews to ig.wikipedia

2018-06-21 Thread Federico Leva (Nemo)
I see the baseline is less than 200k monthly unique devices and there were no huge drops: https://analytics.wikimedia.org/dashboards/vital-signs/#projects=igwiki/metrics=MonthlyUniqueDevices Absent trivial errors, such misclassifications of entire countries have been caused in the past by ISP

Re: [Analytics] A new landing page for the Wikimedia Research team

2018-02-07 Thread Federico Leva (Nemo)
Will it be translatable with standard tools? Federico ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Wikistats2 Better maps and new metric: Legacy Pageviews (a.k.a Pagecounts)

2018-07-11 Thread Federico Leva (Nemo)
The visualisation is very clear. Nuria Ruiz, 11/07/2018 23:29: Also, we have included legacy pageviews in the UI, we used to call these pagecounts and prior to June 2015 this is the metric that we reported as pageviews for all wikimedia sites. Good to have historical data too. Thanks,

Re: [Analytics] Effects on Wikimedia web traffic trends from sites that reuse Wikimedia content and/or trademarks

2019-07-30 Thread Federico Leva (Nemo)
Because there are hundreds of mirrors and new ones are born or die about every week, it's probably worth mentioning we have some lists. https://en.wikipedia.org/wiki/Wikipedia:Mirrors_and_forks/All https://meta.wikimedia.org/wiki/Live_mirrors Federico

[Analytics] 43k monthly active editors on the English Wikipedia

2020-05-23 Thread Federico Leva (Nemo)
The English Wikipedia is showing a pattern that I don't notice on several other wikis. If I'm not mistaken, in April 2020 monthly active editors passed 43k for the first time since 2011 (the year when MobileFrontend was created).

[Analytics] Wikisource pageviews by agent and method

2020-06-15 Thread Federico Leva (Nemo)
The pageviews statistics for the Italian Wikisource are very confusing to me: In May there were supposedly more than 5 million pageviews, of which 3M desktop + 2M

Re: [Analytics] Wikisource pageviews by agent and method

2020-06-15 Thread Federico Leva (Nemo)
Dan Andreescu, 15/06/20 16:37: Nemo would this, our next-up priority for Wikistats , help? Basically, it would let you filter on two different dimensions, so you can look at just user desktop or spider mobile, etc. Maybe. I don't have a pressing need

Re: [Analytics] Growth in reader engagement since 2016?

2021-04-30 Thread Federico Leva (Nemo)
Thanks Kate for the update! Il 30/04/21 02:08, Kate Zimmerman ha scritto: [...] active editors increased 18 percent, not 36 percent[3,5]. [...] [3] December 2020 content interactions and active editor data from https://commons.wikimedia.org/wiki/File:December_2020_Wikimedia_movement_metrics.pdf

Re: [Analytics] Growth in reader engagement since 2016?

2021-04-03 Thread Federico Leva (Nemo)
Il 17/03/21 10:53, Tilman Bayer ha scritto: Are the underlying numbers published somewhere? Hanlon's razor suggests to look at the most stupid explanation available for this number. The easiest piece of statistics currently available to someone who's looking for one in a hurry is the "unique

[Analytics] Re: Pageviews per country

2022-12-21 Thread Federico Leva (Nemo)
Il 21/12/22 19:55, Ismael Olea ha scritto: About the rationale, one of the bigger drivers nowadays is the well known link between heritage, tourism and sustainability (example: the Sustainable Development Goals), so there is a trend to better analyze this context to study and plan. Usually

[Analytics] Re: Pageviews per country

2022-12-21 Thread Federico Leva (Nemo)
Il 20/12/22 20:03, Ismael Olea ha scritto: We are working with a heritage institution in a GLAM project and they are interested in access statistics for the resources they have released in Wikimedia. I think I got the point about how the pageviews concept is and how to use it but, as far as I

[Analytics] Re: best programme ot work with data

2023-01-26 Thread Federico Leva (Nemo)
Il 26/01/23 18:04, Robert Garrigos ha scritto: with some 1.5milion rows, I can not open it with numbers or libreoffice to do sum of the column 4. Which tools do you use to work with such big files? To sum a column in a CSV I would use visidata: https://www.visidata.org/docs/join/ I think