[Wikidata-bugs] [Maniphest] T280485: Additional capacity on the k8s Flink cluster for WCQS updater

akosiaris Wed, 17 Nov 2021 07:00:31 -0800

akosiaris added a comment.


  In T280485#7510193 <https://phabricator.wikimedia.org/T280485#7510193>, 
@Gehel wrote:
  
  >>> In T280485#7506072 <https://phabricator.wikimedia.org/T280485#7506072>, 
@akosiaris wrote:
  >>>
  >>>> Is T280485#7275149 <https://phabricator.wikimedia.org/T280485#7275149> 
related to blazegraph and not flink ? I am not sure what 13B triplets vs 2.8T 
triples means storage wise and in which context.
  >
  > Oh, I now see the confusion! Wrong units (typo) in the initial message. The 
current Flink updater takes data from Wikidata, which has ~13B triples.
  
  😆😆. OK, thanks for clearing that up. That 200 times increase had me worried.
  
  > The new Flink updater will add support for getting data from Commons, which 
has ~2.8B triples. So the new updater will add ~20% more resource consumption 
(assuming a linear cost).
  
  OK, that's nothing then.
  
  > This will mean:
  >
  > - additional storage on Swift (I assume this is trivial given the size of 
Swift and can be ignored)
  > - additional CPU / RAM usage on k8s
  > - additional local storage (/tmp) on the containers
  
  We got enough on all of those 3, no worries there.
  
  > It isn't super clear to me if our strategy is to increase the size of the 
current Flink cluster, or have a new cluster dedicated to the Commons updater 
(to be decided later today).
  
  Cool. Let us know what you decide. On our side, it probably isn't much more 
than 1 more deployment on the k8s cluster. That being said, and assuming my 
memory is up to date with how flink works in session cluster mode, I 'd expect 
it's able to handle this internally without needing another k8s deployment. 
It's also fine to increase the number of worker pods if that's something that 
would make things easier for you.
  
  > Duplicate the existing cluster would provide additional isolation between 
the 2 workflows. This is also the worst case scenario in terms of resource 
needed. The additional estimated resources are:
  >
  > - manager: 1 more pod at 1.6G, cpu: 500m
  > - workers: 3 pods at 2.1G ram, cpu: 1000m
  
  Even in this case, we got that capacity.

TASK DETAIL
  https://phabricator.wikimedia.org/T280485

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, akosiaris
Cc: akosiaris, Zbyszko, Aklapper, RKemper, Gehel, MPhamWMF, wkandek, JMeybohm, 
CBogen, Namenlos314, jijiki, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Dzahn

_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

[Wikidata-bugs] [Maniphest] T280485: Additional capacity on the k8s Flink cluster for WCQS updater

Reply via email to