Gehel added a comment.
>> In T280485#7506072 <https://phabricator.wikimedia.org/T280485#7506072>, @akosiaris wrote: >> >>> Is T280485#7275149 <https://phabricator.wikimedia.org/T280485#7275149> related to blazegraph and not flink ? I am not sure what 13B triplets vs 2.8T triples means storage wise and in which context. Oh, I now see the confusion! Wrong units (typo) in the initial message. The current Flink updater takes data from Wikidata, which has ~13B triples. The new Flink updater will add support for getting data from Commons, which has ~2.8B triples. So the new updater will add ~20% more resource consumption (assuming a linear cost). This will mean: - additional storage on Swift (I assume this is trivial given the size of Swift and can be ignored) - additional CPU / RAM usage on k8s - additional local storage (/tmp) on the containers It isn't super clear to me if our strategy is to increase the size of the current Flink cluster, or have a new cluster dedicated to the Commons updater (to be decided later today). Duplicate the existing cluster would provide additional isolation between the 2 workflows. This is also the worst case scenario in terms of resource needed. The additional estimated resources are: - manager: 1 more pod at 1.6G, cpu: 500m - workers: 3 pods at 2.1G ram, cpu: 1000m TASK DETAIL https://phabricator.wikimedia.org/T280485 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Gehel Cc: akosiaris, Zbyszko, Aklapper, RKemper, Gehel, MPhamWMF, wkandek, JMeybohm, CBogen, Namenlos314, jijiki, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Dzahn
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
