Gehel added a comment.

  >> In T280485#7506072 <https://phabricator.wikimedia.org/T280485#7506072>, 
@akosiaris wrote:
  >>
  >>> Is T280485#7275149 <https://phabricator.wikimedia.org/T280485#7275149> 
related to blazegraph and not flink ? I am not sure what 13B triplets vs 2.8T 
triples means storage wise and in which context.
  
  Oh, I now see the confusion! Wrong units (typo) in the initial message. The 
current Flink updater takes data from Wikidata, which has ~13B triples. The new 
Flink updater will add support for getting data from Commons, which has ~2.8B 
triples. So the new updater will add ~20% more resource consumption (assuming a 
linear cost).
  
  This will mean:
  
  - additional storage on Swift (I assume this is trivial given the size of 
Swift and can be ignored)
  - additional CPU / RAM usage on k8s
  - additional local storage (/tmp) on the containers
  
  It isn't super clear to me if our strategy is to increase the size of the 
current Flink cluster, or have a new cluster dedicated to the Commons updater 
(to be decided later today).
  
  Duplicate the existing cluster would provide additional isolation between the 
2 workflows. This is also the worst case scenario in terms of resource needed. 
The additional estimated resources are:
  
  - manager: 1 more pod at 1.6G, cpu: 500m
  - workers: 3 pods at 2.1G ram, cpu: 1000m

TASK DETAIL
  https://phabricator.wikimedia.org/T280485

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: akosiaris, Zbyszko, Aklapper, RKemper, Gehel, MPhamWMF, wkandek, JMeybohm, 
CBogen, Namenlos314, jijiki, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Dzahn
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to