[Analytics] Farewell, Erik!

2019-02-06 Thread Dario Taraborelli
, or check this lovely portrait <https://www.wired.com/2013/12/erik-zachte-wikistats/> Wired published a while back about "the Stats Master Making Sense of Wikipedia's Massive Data Trove". Dario -- *Dario Taraborelli *Director, Head of Resea

[Analytics] Save the date: Wiki Workshop 2019 to be hosted at The Web Conference 2019 in San Francisco (May 13-14, 2019)

2018-12-10 Thread Dario Taraborelli
Hi everyone, We are thrilled to announce that the *6th annual Wiki Workshop* [1] will be hosted at *The Web Conference 2019* (formerly known as WWW) in San Francisco, CA, on May 13 or 14, 2019 [2]. The workshop provides an annual forum for researchers exploring all aspects of Wikipedia, Wikidata,

Re: [Analytics] Modeling interactions on talk pages and detecting early signs of conversational failure: Research Showcase - June 18, 2018 (11:30 AM PDT| 18:30 UTC)

2018-06-18 Thread Dario Taraborelli
on IRC in the #wikimedia-research channel. Looking forward to seeing you there! Dario On Thu, May 31, 2018 at 5:07 PM Dario Taraborelli < dtarabore...@wikimedia.org> wrote: > Hey everyone, > > we're hosting a dedicated session in June on our joint work with Cornell > and Ji

[Analytics] Fwd: Modeling interactions on talk pages and detecting early signs of conversational failure: Research Showcase - June 18, 2018 (11:30 AM PDT| 18:30 UTC)

2018-05-31 Thread Dario Taraborelli
Hey everyone, we're hosting a dedicated session in June on our joint work with Cornell and Jigsaw on predicting conversational failure on Wikipedia talk pages. This is part of our contribution to WMF's Anti-Harassment program. The showcase

Re: [Analytics] [Wiki-research-l] A new landing page for the Wikimedia Research team

2018-02-11 Thread Dario Taraborelli
istinfo/wiki-research-l >> >> >> > >> > >> ___ >> Wiki-research-l mailing list >> wiki-researc...@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> > > > > -- > Jonathan T. Morgan

[Analytics] A new landing page for the Wikimedia Research team

2018-02-06 Thread Dario Taraborelli
departments at WMF – from Analytics, to Audiences, to Grantmaking, and Programs. If you see anything that’s missing within the scope of the Research team, please let us know <https://phabricator.wikimedia.org/T107389>!Dario* -- *Dario Taraborelli *Director, Head of Research, Wi

Re: [Analytics] [Wikimedia-l] Research Showcase Wednesday, January 17, 2018

2018-01-17 Thread Dario Taraborelli
nt, Engineering Admin > ___ > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ > wiki/Wikimedia-l > New messages to: wikimedi...@lists.wikimedia.org > Unsubs

[Analytics] Kaggle competition to forecast Wikipedia article traffic

2017-07-18 Thread Dario Taraborelli
Merger deadline. This is the last day participants may join or merge teams. - September 1st, 2017 - Final dataset is released. - September 10th, 2017 - Final submission deadline. Competition winners will be revealed after November 10, 2017. Dario -- *Dario Taraborelli *Director, Head

Re: [Analytics] [Research-wmf] Research Showcase, December 21, 2016

2016-12-21 Thread Dario Taraborelli
nical approaches > that could be explored to mitigate risk at a project or community level. > > -- > Sarah R. Rodlund > Senior Project Coordinator-Engineering, Wikimedia Foundation > srodl...@wikimedia.org > > ___ > Research-wmf mailing li

Re: [Analytics] Upcoming Research Showcase, November 16, 2016

2016-11-16 Thread Dario Taraborelli
f >> some of the results across different languages, and to also help >> communities with having access to the results for their specific language >> project. >> >> ​Looking forward to seeing you there, and if you can't make it, please >> feel free to watch the video

[Analytics] SPARQL workshop and WDQS tutorials

2016-09-15 Thread Dario Taraborelli
ch Institute* and *Gene Wiki* - Lucas, *@WikidataFacts* Dario and Stas *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter> ___ Analytics mail

Re: [Analytics] browser dashboards again!

2016-08-30 Thread Dario Taraborelli
ts.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> ___ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > >

Re: [Analytics] pageview counts on page redirects

2016-08-27 Thread Dario Taraborelli
media.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter> ___ Analytics mailin

Re: [Analytics] Analysing link

2016-08-27 Thread Dario Taraborelli
On Sat, Aug 27, 2016 at 1:16 PM, Dario Taraborelli < dtarabore...@wikimedia.org> wrote: > The closest open dataset to what you are referring to is the clickstream > dataset: > > https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream > https://dx.doi.org/10.6084/m9.figshare

Re: [Analytics] Analysing link

2016-08-27 Thread Dario Taraborelli
kimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > ___________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listi

[Analytics] Q4-2016 (April-June) quarterly report for Wikimedia Research

2016-07-30 Thread Dario Taraborelli
index.php?title=File:Technology_Quarterly_Review_-_Q4_FY15-16-_Research_and_Data,_Design_Research,_Analytics,_Performance.pdf=26> team's quarterly report. Best, Dario *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://tw

[Analytics] Research FAQ gets a facelift

2016-06-20 Thread Dario Taraborelli
a line or ping my username on-wiki. Thanks, Dario [1] https://meta.wikimedia.org/wiki/Research:FAQ [2] https://wikimediafoundation.org/wiki/Open_access_policy [3] https://meta.wikimedia.org/w/index.php?title=Research:FAQ=15176953 *Dario Taraborelli *Head of Research, Wikimedia Foundation

Re: [Analytics] Wikipedia Clickstream dataset refreshed (March 2016)

2016-05-02 Thread Dario Taraborelli
number: Hamburg, HRB 86891 > > -BEGIN PGP SIGNATURE- > Version: GnuPG v2.0.29 (GNU/Linux) > > > iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom > hTtPs://xKcd.cOm/1181/ > -END PGP SIGNATURE- > > ___ > Analytics mailing

[Analytics] Wikipedia Clickstream dataset refreshed (March 2016)

2016-04-28 Thread Dario Taraborelli
have any questions, or you can chime in on the talk page of the dataset entry on Meta <https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream>. Show us what you do with this data, if you use it in your research. Dario *Dario Taraborelli *Head of Research, Wikimedia Foun

Re: [Analytics] [Wikimedia-l] [Wiki-research-l] Research showcase: Evolution of privacy loss in Wikipedia

2016-03-20 Thread Dario Taraborelli
e at #wikimedia-research > > <http://webchat.freenode.net/?channels=wikimedia-research> to ask Andrei > > questions. > > > > -Aaron > > > > On Tue, Mar 15, 2016 at 12:53 PM, Dario Taraborelli < > > dtarabore...@wikimedia.org> wrote: > >

[Analytics] Research showcase: Evolution of privacy loss in Wikipedia

2016-03-15 Thread Dario Taraborelli
users seem to have contributed more to this effect than additional activities from existing (but still active) users. Insights from this work should help users, system designers, and policy makers understand and make long-term design choices in online content creation systems. *Dario Taraborelli

Re: [Analytics] [Ops] Dark traffic

2016-03-01 Thread Dario Taraborelli
ikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>> >>> ___ >>> Ops mailing list >>> o...@lists.wikimedia.org >>> https://lists.wikimed

[Analytics] Wiki Workshop 2016 @ ICWSM: deadline extended to March 3

2016-02-23 Thread Dario Taraborelli
Hi all – heads up that we extended the submission deadline for the Wiki Workshop at ICWSM '16 to *Wednesday, March 3, 2016*. (The second deadline remains unchanged: March 11, 2016). You can check the workshop's website for submission instructions or

Re: [Analytics] [Pageviews] [Technical] Simplifying the available static dumps of pageview data

2016-01-06 Thread Dario Taraborelli
ice >>> >>> ___ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> >> -- >> _

[Analytics] What Wikimedia Research is up to in the next quarter

2015-12-18 Thread Dario Taraborelli
talk pages on Meta. You can contact us for any question on IRC via the #wikimedia-research channel and follow @WikiResearch <https://twitter.com/WikiResearch> on Twitter for the latest Wikipedia and Wikimedia research updates hot off the press. Wishing you all happy holidays, Dario and Abbey on behalf o

Re: [Analytics] Page view API questions regarding user agent

2015-12-15 Thread Dario Taraborelli
> >> > Analytics mailing list > >> > Analytics@lists.wikimedia.org > >> > https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > >> > >> > >> > >> -- > >> Oliver Keyes > >>

Re: [Analytics] Backlinks TO Wikipedia

2015-12-02 Thread Dario Taraborelli
glebombs. And you might want to > also know the anchortext, that's extremely valuable for search > indexing. > > -- greg > > > > ___ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia

Re: [Analytics] Pageviews definition + measurement for apps adding link previews + using RESTBase

2015-08-19 Thread Dario Taraborelli
On Aug 18, 2015, at 6:32 PM, Kevin Leduc ke...@wikimedia.org wrote: We briefly considered counting views of Hover Cards as Pageviews, but it was quickly dismissed. First, the feature is not widely used enough to justify Changing the pageview definition. I second that, these are

Re: [Analytics] [Wikimedia-search] Scaleable Event Systems recap

2015-08-03 Thread Dario Taraborelli
nm, clarified with Kevin. On Aug 3, 2015, at 18:38, Dario Taraborelli dtarabore...@wikimedia.org wrote: what are the implications (if any) on event validation? On Mon, Aug 3, 2015 at 3:19 PM, Tomasz Finc tf...@wikimedia.org wrote: Very excited to see this moving forward On Mon, Aug

Re: [Analytics] [Wikimedia-search] Scaleable Event Systems recap

2015-08-03 Thread Dario Taraborelli
___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics -- *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter http://twitter.com/readermeter

[Analytics] Fwd: Wikipedia Page views access

2015-06-19 Thread Dario Taraborelli
Forwarding a note from Ashok Rao (cc’ed), can anyone comment on the dumps server returning 503s? Ashok – we don’t have yet an in-house API to retrieve pageview data, but the Analytics team is working on one: see this thread https://phabricator.wikimedia.org/T44259#1341010. Depending on what

Re: [Analytics] [Technical] Pick storage for pageview cubes

2015-06-09 Thread Dario Taraborelli
I too would love to understand if RestBASE can become our default solution for this kind of data-intensive APIs. Can you guys briefly explain what kind of queries and aggregations would be problematic if we were to go with Cassandra? On Jun 9, 2015, at 8:39 AM, Oliver Keyes

Re: [Analytics] [Technical] Pick storage for pageview cubes

2015-06-09 Thread Dario Taraborelli
, 2015 at 7:10 AM, Dario Taraborelli dtarabore...@wikimedia.org mailto:dtarabore...@wikimedia.org wrote: I too would love to understand if RestBASE can become our default solution for this kind of data-intensive APIs. Can you guys briefly explain what kind of queries and aggregations would

[Analytics] Fwd: [Wikitech-l] API BREAKING CHANGE: Default continuation mode for action=query will change at the end of this month

2015-06-03 Thread Dario Taraborelli
Many people on these lists design and use tools that depend on action=query (beyond bots). If you do, please read the following: Begin forwarded message: From: Brad Jorsch (Anomie) bjor...@wikimedia.org Subject: [Wikitech-l] API BREAKING CHANGE: Default continuation mode for action=query

Re: [Analytics] The awful truth about Wikimedia's article counts

2015-05-22 Thread Dario Taraborelli
in the comments on work related to quality assessment. -Original Message- From: analytics-boun...@lists.wikimedia.org [mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of Dario Taraborelli Sent: Friday, May 22, 2015 21:38 To: A mailing list for the Analytics Team at WMF

Re: [Analytics] [WikimediaMobile] Share a Fact Initial Analysis

2015-05-22 Thread Dario Taraborelli
Thanks for sharing this, Adam. Aside from engagement/funnel data, the critical question for this feature is: does it bring back eyeballs to the site from social media? It looks like it doesn’t yet, at least not in a substantial way, even with the caveat that App traffic is a very small fraction

[Analytics] The awful truth about Wikimedia's article counts

2015-05-22 Thread Dario Taraborelli
From this week’s Signpost, worth reading: https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-05-20/In_focus this is a great illustration of why we need stateless, historically and globally consistent measurements to report the growth of Wikimedia projects (and

[Analytics] Fwd: [Maniphest] [Commented On] T44259: Make domas' pageviews data available in semi-publicly queryable database format

2015-05-21 Thread Dario Taraborelli
Dan – thanks for the thorough update, hope you don’t mind if I repost this to the analytics list – I bet several people on this list are eager to know where this is going. Dario Begin forwarded message: From: Milimetric no-re...@phabricator.wikimedia.org Subject: [Maniphest] [Commented

Re: [Analytics] May 2015 research showcase

2015-05-13 Thread Dario Taraborelli
a reminder that the showcase will start at 11.30 PT. Broadcast link: http://youtu.be/Hj7o5d-OEis http://youtu.be/Hj7o5d-OEis On May 11, 2015, at 4:27 PM, Leila Zia le...@wikimedia.org wrote: Hi everyone, The next research showcase will be live-streamed this Wednesday, May 13 at 11.30

Re: [Analytics] [Technical] WMF-Last-Access

2015-04-27 Thread Dario Taraborelli
I also noticed the cookie stores a string with a 3-letter month (27-Apr-2015), any reason not to use a shorter ISO date instead (2015-04-27)? On Apr 27, 2015, at 3:00 PM, Marcel Ruiz Forns mfo...@wikimedia.org wrote: +1 'last' ___ Analytics

Re: [Analytics] Page views on a more frequent than hourly basis

2015-04-15 Thread Dario Taraborelli
-- Dario Taraborelli Senior Research Scientist, Research and Data Lead Wikimedia Foundation http://wikimediafoundation.org http://nitens.org/taraborelli ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo

Re: [Analytics] Page views on a more frequent than hourly basis

2015-04-15 Thread Dario Taraborelli
out which language is being queried when. Some languages (for e.g. German) we would hypothesize would have more daily seasonality than languages like English. On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli dtarabore...@wikimedia.org wrote: Hirav, Bharath – I also want to hear

[Analytics] Fwd: [Engineering] Wikimedia REST content API is now available in beta

2015-03-10 Thread Dario Taraborelli
Cross-posting from wikitech-l, this will definitely be of interest to those of you on this list who work with our APIs. Begin forwarded message: From: Gabriel Wicke gwi...@wikimedia.org Date: March 10, 2015 at 15:23:03 PDT To: Wikimedia developers wikitec...@lists.wikimedia.org,

Re: [Analytics] Provenance Params

2015-03-10 Thread Dario Taraborelli
On Mar 10, 2015, at 11:26 AM, Adam Baso ab...@wikimedia.org wrote: We're going to use the following format: ?wprov=3_char_featureplatform_one_charmajor_version_of_feature_uint For the first version on iOS, this will be ?wprov=safi1 And Android: ?wprov=safa1 Thanks for the

Re: [Analytics] [Technical] missing dialect subdomains in the new pageviews definition

2015-03-09 Thread Dario Taraborelli
thanks, Oliver (and James for spotting this). On Mar 9, 2015, at 2:30 PM, Oliver Keyes oke...@wikimedia.org wrote: Now logged in Phabricator at https://phabricator.wikimedia.org/T92020 On 9 March 2015 at 16:24, Oliver Keyes oke...@wikimedia.org wrote: Bah; folder names, rather than

Re: [Analytics] [Discussion] User agent data releases

2015-03-05 Thread Dario Taraborelli
heads up that after a review with Legal we decided that we should not release the sampled raw dataset. Oliver is now working on making parsed UA data available. On Mar 5, 2015, at 10:52 AM, Oliver Keyes oke...@wikimedia.org wrote: Just a clarifying note: Dario still needs to review the

Re: [Analytics] page views by location

2015-03-02 Thread Dario Taraborelli
unfortunately not. The proposal hasn’t been cleared yet and we don’t have an ETA for its launch. On Mar 2, 2015, at 9:53 AM, Seth Stephens-Davidowitz seth.steph...@gmail.com wrote: Thanks. Do you know when that might be available? Seth On Mon, Mar 2, 2015 at 12:52 PM, Dario

Re: [Analytics] page views by location

2015-03-02 Thread Dario Taraborelli
Seth, check out this proposal submitted by a team at Los Alamos National Laboratory: https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews On Mar 2, 2015, at 9:47 AM, Toby Negrin

Re: [Analytics] Provenance Params

2015-02-24 Thread Dario Taraborelli
it sounds like we have consensus for a short-term solution based on a vanilla parameter, as long as it doesn’t clash with other internal parameters. I agree with Gergo that a shortener is appealing as a long-term solution, this is what the vast majority of platforms are using for analytics

[Analytics] Wikipedia aggregate clickstream data released

2015-02-17 Thread Dario Taraborelli
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia http://dx.doi.org/10.6084/m9.figshare.1305770 http://dx.doi.org/10.6084/m9.figshare.1305770 This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of

Re: [Analytics] s1-analytics-slave

2015-02-17 Thread Dario Taraborelli
Hi Sean, no objection on my end either. I’ll have to update a bunch of scripts that populate the EE dashboards [1] but it’s no big deal as long as we clearly communicate the ETA. [1] http://ee-dashboard.wmflabs.org/dashboards/enwiki-metrics

[Analytics] February 2015 Research Showcase: Global South survey results; data imports in OpenStreetMap

2015-02-11 Thread Dario Taraborelli
I am thrilled to announce our speaker lineup for this month’s research showcase https://www.mediawiki.org/wiki/Analytics/Research_and_Data/Showcase#February_2015. Our own Haitham Shammaa will present results from the Global South survey. We also invited Stamen’s Alan McConchie, an

[Analytics] Client-side URL redirects

2015-02-03 Thread Dario Taraborelli
Reporting here the results of some quick investigation we did on MediaWiki’s handling of redirects. Since this change [1] got merged, page redirects (such as en:Obama = en:Barack_Obama) refresh the URL client-side via Javascript. This doesn’t result in an extra HTTP request so the change should

[Analytics] Scholarly citations by PMID/PMCID in Wikipedia

2015-02-02 Thread Dario Taraborelli
Hey all, we just released a dataset of scholarly citations in the English Wikipedia by Pubmed / Pubmed Central ID. http://dx.doi.org/10.6084/m9.figshare.1299540 The dataset currently includes the first known occurrence of a PMID or PMCID citation in an English Wikipedia article and the

[Analytics] Early registration for CSCW 2015 ends January 30th

2015-01-27 Thread Dario Taraborelli
For those of you interested in attending, the early registration deadline is January 30. See also https://meta.wikimedia.org/wiki/Research:CSCW_2015 https://meta.wikimedia.org/wiki/Research:CSCW_2015 — — — CSCW 2015 | March 14-18 | Vancouver, BC, Canada http://cscw.acm.org

[Analytics] Wikimedia referrer policy

2015-01-20 Thread Dario Taraborelli
I’ve been discussing with the folks at CrossRef (the largest registry of Digital Object Identifiers, think of it as the ICANN of science) how to accurately measure the impact of traffic driven from Wikipedia/Wikimedia to scholarly resources. While digging into their data, we realized that

Re: [Analytics] DNT, standards, and expectations

2015-01-16 Thread Dario Taraborelli
I second Aaron’s concerns, which I previously expressed during the consultation about the new privacy policy. My main objection to the proposed solution is that by saying “Wikimedia honors DNT headers” we imply – by the most popular/de facto interpretation of DNT – that we do 3rd party tracking

Re: [Analytics] DNT, standards, and expectations

2015-01-16 Thread Dario Taraborelli
Ori, we are making use of the header that we think is consistent with the expectation of users based on what evidence? I’ve seen a single reference cited in this thread pointing to a study that candidly declares in its abstract: “Because Do Not Track is so new, as far as we know this is

Re: [Analytics] DNT, standards, and expectations

2015-01-16 Thread Dario Taraborelli
with HTTP requests. On Jan 16, 2015, at 7:54 PM, Dario Taraborelli da...@wikimedia.org wrote: Ori, we are making use of the header that we think is consistent with the expectation of users based on what evidence? I’ve seen a single reference cited in this thread pointing to a study

Re: [Analytics] DNT, standards, and expectations

2015-01-16 Thread Dario Taraborelli
I’m searching for references looking at user perception of third-party behavioral tracking vs logging, any pointer would be appreciated. On Jan 16, 2015, at 8:16 PM, Dario Taraborelli dtarabore...@wikimedia.org wrote: I didn’t reference the McDonald study in my reply, but I too am

[Analytics] Geo-aggregation of Wikipedia page views: Maximizing geographic granularity while preserving privacy – a proposal

2015-01-12 Thread Dario Taraborelli
I’m sharing a proposal that Reid Priedhorsky and his collaborators at Los Alamos National Laboratory recently submitted to the Wikimedia Analytics Team aimed at producing privacy-preserving geo-aggregates of Wikipedia pageview data dumps and making them available to the public and the research

Re: [Analytics] Making EventLogging output to a log file instead of the DB

2015-01-07 Thread Dario Taraborelli
On Jan 7, 2015, at 6:42 AM, Gilles Dubuc gil...@wikimedia.org wrote: Right -- couldn't we just tag the URL? The event of the user actually viewing the image is completely disconnected from the URL hit in Media Viewer, which is why we need EL and can't rely on existing server logs.

Re: [Analytics] WikiGrok and EventLogging

2015-01-07 Thread Dario Taraborelli
agreed. Many of these articles will see spikes in traffic during the test (as the sample includes many celebrity articles) but the historical volume of traffic for the whole sample should give us a decent estimate of the throughput. I also wouldn’t worry about any events other than

[Analytics] Article feedback corpus released

2014-12-24 Thread Dario Taraborelli
I’m glad to announce the release of an open-licensed corpus with 1.5M records from the Article Feedback v5 pilot. http://dx.doi.org/10.6084/m9.figshare.1277784 Thanks to everyone who helped make this happen, Fabrice in particular for shepherding this through. Dario — This dataset contains

[Analytics] Freebase winding down, to be ingested into Wikidata

2014-12-17 Thread Dario Taraborelli
In case you missed the announcement: https://plus.google.com/app/basic/stream/z122wpyxhob0hjoik04cc3vatw2zfv4zszk0k Dario___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Page view generalized filter draft (due Friday, Dec 12th)

2014-12-15 Thread Dario Taraborelli
Oliver, Aaron – thanks for pushing this forward! Glad that we’re moving on with the implementation. On Dec 15, 2014, at 11:32 AM, Oliver Keyes oke...@wikimedia.org wrote: Totally! On 15 December 2014 at 14:22, Andrew Otto ao...@wikimedia.org mailto:ao...@wikimedia.org wrote: Ah cool,

[Analytics] EventLogging data QA

2014-12-11 Thread Dario Taraborelli
I am kicking off this thread after a good conversation with Nuria and Kaldari on pain points and opportunities we have around data QA for EventLogging. Kaldari, Leila and I have gone through several rounds of data QA before and after the deployment of new features on Mobile and we haven’t found

Re: [Analytics] data in Vital Signs

2014-11-04 Thread Dario Taraborelli
to add some context to the present approach, you may remember that when we defined Editor Model metrics we started from the highest possible level of aggregation (i.e. all namespaces combined, archive table included). See rationale below from a previous email exchange: we tried to stick to two

[Analytics] Wikimedia Research showcase – October 15 2014, 11.30 PT

2014-10-14 Thread Dario Taraborelli
After a break in September, we’re resuming our monthly Research and Data showcase. The next showcase will be live-streamed tomorrow Wednesday October 15 at 11.30 PT. As usual you can join the conversation via IRC on freenode.net by joining the #wikimedia-research channel. We look forward to

Re: [Analytics] Welcome Marcel Ruiz Forns to the Analytics Development team

2014-10-07 Thread Dario Taraborelli
Benvingut — looking forward to working with you, Marcel. On Oct 7, 2014, at 17:52, Jonas Xavier jonas@gmail.com wrote: Bem-vindo, Marcel! ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] eventlogging largest tables

2014-09-29 Thread Dario Taraborelli
On Sep 27, 2014, at 11:42 AM, Aaron Halfaker ahalfa...@wikimedia.org wrote: I'm not surprised that PageContentSaveComplete is big. That's a very useful table and it sees a lot of rows for good reason (every revision saved on every wiki). As for the Multimedia/Mediaviewer tables, we

[Analytics] Ten Simple Rules for Better Figures

2014-09-12 Thread Dario Taraborelli
A no-nonsense guide to scientific data visualization published in PLOS Computational Biology http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003833 (the contents are CC-BY licensed and the source code is here: https://github.com/rougier/ten-rules ) Dario

Re: [Analytics] pitching the Gender Edit Dashboard

2014-08-31 Thread Dario Taraborelli
homophily is prevalent: editors tend to interact with other editors having similar emotional styles (e.g., editors expressing more anger connect more with one another). On Aug 29, 2014, at 11:59 AM, Dario Taraborelli dtarabore...@wikimedia.org wrote: I too recommend the use of micro-surveys

Re: [Analytics] pitching the Gender Edit Dashboard

2014-08-29 Thread Dario Taraborelli
I too recommend the use of micro-surveys. The full rationale is here [1] but one of the immediate benefits I see is the ability to randomly sample from the population of newly registered users. It shouldn’t be particularly hard to set up an ongoing gender micro-survey to collect this data over

Re: [Analytics] Public EventLogging -- LabsDB

2014-08-13 Thread Dario Taraborelli
(expanding on what I think Dan is referring to re: goals), addressing this issue would allow EEVS to access data needed to generate breakdowns for metrics by method/target site (mobile, desktop, apps). On Aug 13, 2014, at 1:40 PM, Dan Andreescu dandree...@wikimedia.org wrote: Kevin, for what

Re: [Analytics] Data inconsistency with displayMobile in ServerSideAccountCreation

2014-07-25 Thread Dario Taraborelli
Dan, we were just having a separate discussion about the fact that the various isMobile or displayMobile fields predate the launch of apps and are likely to create artifacts if used to filter app-specific events. IMO there should be a default field in the event capsule for {desktop site,

[Analytics] Monthly Research Data Showcase this Wednesday

2014-06-16 Thread Dario Taraborelli
The next Research Data showcase will be live-streamed this Wednesday 6/18 at 11.30 PT. The streaming link will be posted on the lists a few minutes before the showcase starts and as usual you can join the conversation on IRC at #wikimedia-research. We look forward to seeing you! Dario This

Re: [Analytics] [WikimediaMobile] Mobile revtags, redux

2014-06-16 Thread Dario Taraborelli
Yuvi, I confirm the issue Jeremy reported (note that both tags for app edits are correctly captured in lowercase in the database, so this is a UI only issue). Aside from this, any ETA for maintenance script #1 (removal of tagged account creations)? Thanks again for the quick turnaround,

Re: [Analytics] Data quality issues with account creation log

2014-06-13 Thread Dario Taraborelli
+1 On Jun 13, 2014, at 6:15 AM, Aaron Halfaker ahalfa...@wikimedia.org wrote: As a data consumer, I'd prefer if columns matched between EventLogging and production DBs as closely as possible, so VARBINARY sounds like a win to me. On Fri, Jun 13, 2014 at 5:39 AM, Nuria Ruiz

Re: [Analytics] Data quality issues with account creation log

2014-06-06 Thread Dario Taraborelli
Nuria I am hoping we can recover the garbled usernames from the raw JSON logs, Please have in mind that we have logs only from the last 90 days. this is not true, we have server-side data covering the whole lifespan of the latest ServerSideAccountCreation in /a/eventlogging/archive. I

Re: [Analytics] Data quality issues with account creation log

2014-06-05 Thread Dario Taraborelli
. On Thu, Jun 5, 2014 at 5:47 PM, Steven Walling swall...@wikimedia.org wrote: On Thu, Jun 5, 2014 at 1:24 PM, Dario Taraborelli dtarabore...@wikimedia.org wrote: • Use event_userId whenever possible This is really a best practice everyone should follow in all analysis. Unless you're

Re: [Analytics] Data quality issues with account creation log

2014-06-05 Thread Dario Taraborelli
and yes, I wish we had a gu_id included in ServerSideAccountCreation (assuming MediaWiki knows it by the time the event is generated) On Jun 5, 2014, at 4:39 PM, Dario Taraborelli da...@wikimedia.org wrote: I am hoping we can recover the garbled usernames from the raw JSON logs, but you’re

Re: [Analytics] Table of Wikis (for supporting cross-wiki work)

2014-06-03 Thread Dario Taraborelli
. On 3 June 2014 07:51, Dario Taraborelli dtarabore...@wikimedia.org wrote: that’s nifty, thanks Aaron. On Jun 3, 2014, at 5:14 AM, Dan Andreescu dandree...@wikimedia.org wrote: awesome! On Mon, Jun 2, 2014 at 7:49 PM, Aaron Halfaker ahalfa...@wikimedia.org wrote: I polled https

Re: [Analytics] purging old data from eventlogging db

2014-05-21 Thread Dario Taraborelli
The motivation behind your proposal is (I think) a desire to have a unified configuration interface for data collection jobs. This makes total sense and it's worth pursuing. I just don't think we should stuff everything into the schema. The schema is just that: a schema. It's a data model.

Re: [Analytics] Monthly research data showcase livestreamed today

2014-05-21 Thread Dario Taraborelli
The livestream link is http://youtu.be/AUupsnvV1oA On Wed, May 21, 2014 at 7:50 AM, Dario Taraborelli dtarabore...@wikimedia.org wrote: The next Research Data showcasehttps://www.mediawiki.org/wiki/Analytics/Research_and_Data/Showcase will be live-streamed *today Wed 5/21 at 11.30 PT

[Analytics] Post mortem: slow queries on analytics slaves

2014-05-14 Thread Dario Taraborelli
If – like me – you had jobs or queries on the analytics slaves that used to finish in seconds or minutes and suddenly started taking several minutes to hours to complete, you were probably affected by a problem with the table types used on the slaves. Sean nailed down the issue which has now

Re: [Analytics] [Multimedia] Using EventLogging for funnel analysis

2014-05-14 Thread Dario Taraborelli
On May 14, 2014, at 8:56 AM, Dan Andreescu dandree...@wikimedia.org wrote: Any suggestions for the best way to visualize the data once we have it? Are there some existing LIMN graphs that would be well-suited for a funnel analysis like this one? Or should we simply use a standard line graph

Re: [Analytics] Adding indexes to EventLogging tables

2014-05-13 Thread Dario Taraborelli
Sean, tendril is really awesome. I too would love to review the performance of some queries used for the EE dashboards. One in particular [1] used to be fairly fast and is now taking an ugly lot of time to complete, possibly due to some schema change I was unaware of. I’ll drop you a line

Re: [Analytics] db1047 one box to rule them all

2014-05-02 Thread Dario Taraborelli
Hi Gilles, you shouldn’t use “research_prod” if you simply need to perform read-only queries against the slaves (the “research” user is the one you should use instead, at least until we revisit the policy of SQL credentials with ops). I’ll drop you a line off-list with instructions on the

Re: [Analytics] db1047 one box to rule them all

2014-04-30 Thread Dario Taraborelli
On Apr 30, 2014, at 8:40 AM, Sean Pringle sprin...@wikimedia.org wrote: On Wed, Apr 30, 2014 at 12:44 PM, Oliver Keyes oke...@wikimedia.org wrote: Okay, so, have tested (to a limited degree. The work I'm doing that involves the dbs involves eventlogging, so this is mostly me making up excuses

Re: [Analytics] db1047 one box to rule them all

2014-04-30 Thread Dario Taraborelli
~ 30 hours of replag as I write but this is very exciting, thanks Sean! In case you’re wondering, the EventLogging DB is called “log” as the previous one. On Apr 30, 2014, at 11:49 AM, Oliver Keyes oke...@wikimedia.org wrote: Whee! On 30 April 2014 02:48, Sean Pringle

[Analytics] Editor activation data

2014-04-29 Thread Dario Taraborelli
(cross-posting, slightly edited to add more context) I uploaded several plots with absolute editor activation counts and editor activation rates. These are based on data we generated for the sensitivity analysis of “new user” metrics. [1]

Re: [Analytics] db1047 one box to rule them all

2014-04-29 Thread Dario Taraborelli
Sean, consolation prizes are understated, this is terrific. I just noticed that centralauth is not included, after EventLogging data this is the most useful database to have replicated on the big one box. Dario On Apr 29, 2014, at 6:31 PM, Sean Pringle sprin...@wikimedia.org wrote: On Wed,

Re: [Analytics] Pmpta going away and taking some analytics slaves with it :-)

2014-04-21 Thread Dario Taraborelli
alright, that’s very unfortunate – thanks Christian for catching this. All these slaves are critical for a variety of scripts that populate dashboards and ad-hoc analysis outside of enwiki and dewiki. I’ll immediately file an RT ticket. Dario On Apr 21, 2014, at 5:27 PM, Toby Negrin

Re: [Analytics] Pmpta going away and taking some analytics slaves with it :-)

2014-04-21 Thread Dario Taraborelli
scrap that, I see there’s already an open ticket, I’ll follow up there. On Apr 21, 2014, at 5:32 PM, Dario Taraborelli da...@wikimedia.org wrote: alright, that’s very unfortunate – thanks Christian for catching this. All these slaves are critical for a variety of scripts that populate

[Analytics] Wikimedia monthly research data showcase: live streamed tomorrow

2014-04-15 Thread Dario Taraborelli
The next Research Data showcase will be live-streamed tomorrow Wed 4/16 at 11.30 PT. The streaming link will be posted on the lists a few minutes before the showcase starts and you can join the conversation on IRC at #wikimedia-research. We look forward to seeing you! Dario This month:

Re: [Analytics] Wikipedia page views

2014-03-25 Thread Dario Taraborelli
any interpolation for missing data occurred around that date. [1] http://www.wikipediatrends.com/?query[]=Eros [2] http://stats.grok.se/en/latest90/Eros On Mar 25, 2014, at 7:09 AM, Dario Taraborelli dtarabore...@wikimedia.org wrote: On Mar 25, 2014, at 7:01 AM, Alex Druk alex.d...@gmail.com

Re: [Analytics] Wikimedia monthly research showcase: Feb 26, 11.30 PT

2014-02-26 Thread Dario Taraborelli
streaming will start in 2 minutes at: http://youtu.be/arO9YzcTWGE On Feb 25, 2014, at 6:06 PM, Dario Taraborelli da...@wikimedia.org wrote: Starting tomorrow (February 26), we will be broadcasting the monthly showcase of the Wikimedia Research and Data team. The showcase is an opportunity

[Analytics] Upcoming research newsletter: new papers open for review

2014-02-24 Thread Dario Taraborelli
feel free to get in touch off-list. Dario Taraborelli and Tilman Bayer [1] http://meta.wikimedia.org/wiki/Research:Newsletter ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] [Wikitech-l] Exit stats?

2014-02-24 Thread Dario Taraborelli
Hi Strainu that’s correct: we do have aggregate entry/exit reports based on panel data from comScore for all Wikimedia properties , What does that mean, exactly? Strainu we obtain from comScore on a monthly basis aggregate data on the % of entries by referrer domain (e.g. YouTube -

  1   2   >