AndrewTavis_WMDE added a comment.

  Post a large discussion about this in the `data-engineering-collab` channel 
on Slack, the general findings for this are:
  - The public Superset instance isn't suitable for this at this time and 
there's no time table for it to be (see above comments)
  - A suggestion of putting this information on Wikistats 
<> was agreed to be too complex to 
setup and manage
    - We would need to use AQS 2 (Analytics Query Service) to make a 
service/API for this
  - An initial suggestion from WMDE to target Prometheus with the DAG was 
decided against
    - It is possible to push data to Prometheus, but there are many 
complications with this
  - A new suggestion is to leverage Turnilo 
<> for this
    - There is a private instance at 
    - There are also public instances of this as seen at <>
      - Wikitech docs for this can be found at
      - The Turnilo dashboard is hosted on Cloud VPS 
      - The code for the Turnilo instance can be found at 
    - The way this would be achieved is that we would have the published 
datasets <> folder be 
another target of the DAG jobs, and we'd then ingest this data via the Turnilo 
  This sounds like a good way forward, but the question of setting up the 
Turnilo instance and maintaining it then comes to mind. A big question is: how 
often are data pipelines supposed to be public, and would putting it all on a 
single Turnilo instance work well for our requirements?



To: AndrewTavis_WMDE
Cc: AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, S8321414, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
Wikidata-bugs mailing list --
To unsubscribe send an email to

Reply via email to