Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Yuvi Panda
On Sat, Dec 13, 2014 at 2:34 PM, Yuvi Panda yuvipa...@gmail.com wrote: If a lot of people are doing this, then perhaps it makes sense to have an 'augmented real time streaming' interface that is an exact replica of the streaming interface but with diffs added. Or rather, if I were to build

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Scott Hale
Great idea, Yuvi. Speaking as someone who just downloaded diffs for a month of data from the streaming API for a research project, I certainly could see an 'augmented stream' with diffs included being very useful for research and also for bots. On Sat, Dec 13, 2014 at 10:52 PM, Yuvi Panda

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Oliver Keyes
Oh dear god, that would be incredible. The non-streaming API has a wonderful bug: if you request a series of diffs, and there are 1 uncached diffs in that series, only the first uncached diff will be returned. For the rest it returns...an error? No. Some kind of special value? No. It returns an

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Ed Summers
+1 Yuvi About a year ago I put together a little program that identified .uk external links in Wikipedia’s changes for the web archiving folks at the British Library. Because it needed to fetch the diff for each change I never pushed it very far, out of concerns for the API traffic. I never

[Wiki-research-l] Pageviews, mobile versus desktop

2014-12-13 Thread Oliver Keyes
A graph I just generated while messing around with the high-granularity data we used in the monthly metrics readership report: http://ironholds.org/misc/pageviews_trends.png The thing I find really interesting about this is not the trend (mobile up, desktop down. As Lehrer said, this we know from

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Aaron Halfaker
Hey folks, I've been working on building up a revision diffs service that you'd be able to listen to or download a dump of revision diffs. See https://github.com/halfak/Difference-Engine for my progress on the live system and https://github.com/halfak/MediaWiki-Streaming for my progress

Re: [Wiki-research-l] Pageviews, mobile versus desktop

2014-12-13 Thread Ed Summers
On Dec 13, 2014, at 12:18 PM, Oliver Keyes oke...@wikimedia.org wrote: I'm not sure what this means (desktop users are weird? There's a lot of bot traffic we're not catching? That's my guess) but I thought it was pretty and might provoke some hypothesising. So, here you go! I think the

Re: [Wiki-research-l] Pageviews, mobile versus desktop

2014-12-13 Thread Oliver Keyes
Bah, you're right! Will reupload. Pageviews are bucketed by UTC day, although the axis is by months to avoid making it essentially unreadable. It's generated in ggplot2 using theme_bw() (one of my favourite combinations) On 13 December 2014 at 12:33, Ed Summers e...@pobox.com wrote: On Dec

Re: [Wiki-research-l] Pageviews, mobile versus desktop

2014-12-13 Thread Oliver Keyes
Ooh, that's a really good point. In fact, we know there's different behaviour - mobile rises on weekends, desktop falls, but the desktop fall the mobile rise. I'm knee-deep in adjusted R2 values right now but I'll visualise that way and see what happens :) On 13 December 2014 at 13:17, Ed

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Mitar
Hi! I made a a Meteor DDP API to the stream of recent changes on all WikiMedia wikis. Now you can simply use DDP.connect on in your Meteor application to connect to stream of changes on Wikipedia. You can use MongoDB queries to limit only to those changes you are interested in. If there is

Re: [Wiki-research-l] Pageviews, mobile versus desktop

2014-12-13 Thread Oliver Keyes
http://ironholds.org/misc/pageviews_year_and_week.png - fascinating! It reveals a lot of seasonality in the desktop views - again, not replicated on mobile (at least, not so strongly) On 13 December 2014 at 13:49, Oliver Keyes oke...@wikimedia.org wrote: Ooh, that's a really good point. In

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Jeremy Baron
On Dec 13, 2014 12:33 PM, Aaron Halfaker ahalfa...@wikimedia.org wrote: 1. It turns out that generating diffs is computationally complex, so generating them in real time is slow and lame. I'm working to generate all diffs historically using Hadoop and then have a live system listening to recent