Hey Gergo, thanks for the heads up!

The big questions here is: how does it scale? Sending events to 100 clients may
work, but does it work for 100 thousand?

And then there's several more important details to sort out: What's the
granularity of subscription - a wiki? A page? Where does filtering by namespace
etc happen? How big is the latency? How does recovery/re-sync work after
disconnect/downtime?

I have not read the entire conversation, so the answers might already be there -
my appologies if they are, just point me there.

Anyway, if anyone has a good solution for sending wiki-events to a large number
of subscribers, yes, please let us (WMDE/Wikidata) know about it!

Am 26.09.2016 um 22:07 schrieb Gergo Tisza:
> On Mon, Sep 26, 2016 at 5:57 AM, Andrew Otto <o...@wikimedia.org> wrote:
> 
>>  A public resumable stream of Wikimedia events would allow folks
>> outside of WMF networks to build realtime stream processing tooling on top
>> of our data.  Folks with their own Spark or Flink or Storm clusters (in
>> Amazon or labs or wherever) could consume this and perform complex stream
>> processing (e.g. machine learning algorithms (like ORES), windowed trending
>> aggregations, etc.).
>>
> 
> I recall WMDE trying something similar a year ago (via PubSubHubbub) and
> getting vetoed by ops. If they are not aware yet, might be worth contacting
> them and asking if the new streaming service would cover their use cases
> (it was about Wikidata change invalidation on third-party wikis, I think).
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to