Gehel created this task.
Gehel added projects: Wikidata-Query-Service, Wikidata, Operations.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Our evaluation and proof of concept around Flink is moving forward. We need 
to start thinking about a deployment strategy. There are still a lot of 
unknowns, this is the start of the discussion, not the plan yet.
  
  **Context / problem space**
   See T244590 <https://phabricator.wikimedia.org/T244590> for the larger 
context.
  
  The new WDQS update strategy is an event driven application. This is a stream 
processing application that needs facilities for reordering of events, 
management of late events and management of state and check-pointing. Flink 
<https://flink.apache.org/> provides for all those needs, was already 
envisioned as part of the Event Platform 
<https://wikitech.wikimedia.org/wiki/Event_Platform> and is also being looked 
at by CPT for similar use cases.
  
  **Requirements**
  In our use case, Flink requires:
  
  - compute resources (CPU / RAM)
    - no numbers yet on how much resources we need, but the expectation is that 
the requirements are going to be similar to what we need for the current 
updater, which is sharing resources with Blazegraph
  - some local storage for state (which can be considered as transient)
    - our current estimate is that local state will be < 1 GB, but this needs 
to be refined
  - shared storage for check-pointing
    - current strategy is to use HDFS, but other backends can be supported 
(NFS, Cassandra, ...)
  
  **Dependencies**
  
  - initial state is expected to be loaded from HDFS on our Hadoop cluster
  - kafka (-main or -jumbo) to consume various event streams
  - wikidata to enrich events with actual content
  - kafka to produce TTL stream
  - some system (TBD) for check-point storage
  
  **Strategies**
  Since we don't have experience with Flink yet, the longer term use cases are 
still undefined, and addressing the updater issues for WDQS is time sensitive, 
it might make sense to have a short term intermediate solution and to evolve it 
in a longer term solution.
  
  - k8s: Flink itself has no persistent state, it might be a candidate for k8s. 
Kubernetes native support from Flink seems to still be experimental, but a 
standalone deployment seems viable
  - dedicated Flink cluster on new hardware (just for the WDQS use case)
  - shared Flink cluster on new hardware (shared cluster for WDQS and CPT use 
cases + additional future use cases)
  - dedicated Flink cluster collocated on existing WDQS hardware

TASK DETAIL
  https://phabricator.wikimedia.org/T247058

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: Aklapper, dcausse, Zbyszko, Gehel, darthmon_wmde, Legado_Shulgin, Nandana, 
Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, 
Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, 
Mbch331, Rxy, Jay8g, fgiunchedi
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to