Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Jeremy Baron
On Dec 13, 2014 12:33 PM, "Aaron Halfaker" wrote: > 1. It turns out that generating diffs is computationally complex, so generating them in real time is slow and lame. I'm working to generate all diffs historically using Hadoop and then have a live system listening to recent changes to keep the d

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Mitar
Hi! Now with full diffs as well. Mitar On Sat, Dec 13, 2014 at 8:28 PM, Mitar wrote: > Hi! > > I made a a Meteor DDP API to the stream of recent changes on all > WikiMedia wikis. Now you can simply use DDP.connect on in your Meteor > application to connect to stream of changes on Wikipedia. Yo

Re: [Wiki-research-l] Pageviews, mobile versus desktop

2014-12-13 Thread Oliver Keyes
http://ironholds.org/misc/pageviews_year_and_week.png - fascinating! It reveals a lot of seasonality in the desktop views - again, not replicated on mobile (at least, not so strongly) On 13 December 2014 at 13:49, Oliver Keyes wrote: > > Ooh, that's a really good point. In fact, we know there's d

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Mitar
Hi! I made a a Meteor DDP API to the stream of recent changes on all WikiMedia wikis. Now you can simply use DDP.connect on in your Meteor application to connect to stream of changes on Wikipedia. You can use MongoDB queries to limit only to those changes you are interested in. If there is interes

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Thomas Steiner
Hi all, +1 to Ed's point on making it a parameter rather than a new endpoint. I would definitely use it. Currently, I share a Server-Sent Events API connection for my projects (and invite others to use it, too: http://wikipedia-edits.herokuapp.com/sse). Thanks, Tom -- Dr. Thomas Steiner, Emplo

Re: [Wiki-research-l] Pageviews, mobile versus desktop

2014-12-13 Thread Oliver Keyes
Ooh, that's a really good point. In fact, we know there's different behaviour - mobile rises on weekends, desktop falls, but the desktop fall > the mobile rise. I'm knee-deep in adjusted R2 values right now but I'll visualise that way and see what happens :) On 13 December 2014 at 13:17, Ed Summer

Re: [Wiki-research-l] Pageviews, mobile versus desktop

2014-12-13 Thread Ed Summers
It might be interesting to bucket by week to see if you still see the difference in clustering between desktop and mobile. I wonder if it’s a result of different behavior on desktop/mobile on weekdays/weekends? //Ed > On Dec 13, 2014, at 12:37 PM, Oliver Keyes wrote: > > Bah, you're right! Wi

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Oliver Keyes
I'd be interested in helping if we could generalise it! You can probably get a substantial speed improvement in C or C++. C and C++ are generaliseable to Python and R, our primary working languages for analytics. And R lacks any kind of text diffing engine, so I've been distinctly looking into how

Re: [Wiki-research-l] Pageviews, mobile versus desktop

2014-12-13 Thread Oliver Keyes
Bah, you're right! Will reupload. Pageviews are bucketed by UTC day, although the axis is by months to avoid making it essentially unreadable. It's generated in ggplot2 using theme_bw() (one of my favourite combinations) On 13 December 2014 at 12:33, Ed Summers wrote: > > > > On Dec 13, 2014, at

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Aaron Halfaker
Hey folks, I've been working on building up a revision diffs service that you'd be able to listen to or download a dump of revision diffs. See https://github.com/halfak/Difference-Engine for my progress on the live system and https://github.com/halfak/MediaWiki-Streaming for my progress developin

Re: [Wiki-research-l] Pageviews, mobile versus desktop

2014-12-13 Thread Ed Summers
> On Dec 13, 2014, at 12:18 PM, Oliver Keyes wrote: > > I'm not sure what this means (desktop users are weird? There's a lot of bot > traffic we're not catching? That's my guess) but I thought it was pretty and > might provoke some hypothesising. So, here you go! I think the axis labels are f

[Wiki-research-l] Pageviews, mobile versus desktop

2014-12-13 Thread Oliver Keyes
A graph I just generated while messing around with the high-granularity data we used in the monthly metrics readership report: http://ironholds.org/misc/pageviews_trends.png The thing I find really interesting about this is not the trend (mobile up, desktop down. As Lehrer said, this we know from

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Ed Summers
+1 Yuvi About a year ago I put together a little program that identified .uk external links in Wikipedia’s changes for the web archiving folks at the British Library. Because it needed to fetch the diff for each change I never pushed it very far, out of concerns for the API traffic. I never ask

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Oliver Keyes
Oh dear god, that would be incredible. The non-streaming API has a wonderful bug: if you request a series of diffs, and there are >1 uncached diffs in that series, only the first uncached diff will be returned. For the rest it returns...an error? No. Some kind of special value? No. It returns an e

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Scott Hale
Great idea, Yuvi. Speaking as someone who just downloaded diffs for a month of data from the streaming API for a research project, I certainly could see an 'augmented stream' with diffs included being very useful for research and also for bots. On Sat, Dec 13, 2014 at 10:52 PM, Yuvi Panda wrote:

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Yuvi Panda
On Sat, Dec 13, 2014 at 2:34 PM, Yuvi Panda wrote: > If a lot of people are doing this, then perhaps it makes sense to have > an 'augmented real time streaming' interface that is an exact replica > of the streaming interface but with diffs added. Or rather, if I were to build such a thing, would

Re: [Wiki-research-l] How to track all the diffs in real time?

2014-12-13 Thread Yuvi Panda
If a lot of people are doing this, then perhaps it makes sense to have an 'augmented real time streaming' interface that is an exact replica of the streaming interface but with diffs added. ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia