dcausse created this task.
dcausse added projects: Wikidata-Query-Service, serviceops.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  The WDQS updater have several config options to reduce the concurrency at 
which it calls the MW api.
  
  The config option `wikibase_repo_thread_pool_size` controls the size of the 
thread pool running HTTP requests.
  
  During a test of this application with zookeeper and the flink-k8s-operator 
we had to backfill around 2weeks of updates and this cause a massive load on 
the mw-api-int cluster.
  
  We lowered this value from 30 to 5 expecting to see a 1/6 fold reduction but 
this reduction was no near this, we barely saw the impact suggesting that the 
current limits are already too high and the system is limited by the endpoint 
capacity not by itself.
  
  Looking a the code this limits is imposed on the HTTP thread pool that is 
attached to a job task, given that we run at a parallelism of 12 this means 
that the actual number of concurrent requests is `parallelism * 
wikibase_repo_thread_pool_size`.
  
  So we went from `30*12=360` to `5*12=60`.
  
  We should definitely change how this is configured to take the flink 
parallelism into account.
  
  AC:
  
  - the updater should have a single option to control the MW requests 
concurrency
  - we should probably not run the AsyncOp over all the 12 tasks

TASK DETAIL
  https://phabricator.wikimedia.org/T346456

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, bking, Clement_Goubert, dcausse, Kappakayala, AWesterinen, 
Arnoldokoth, wkandek, JMeybohm, Namenlos314, jijiki, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to