EBernhardson added a comment.

  All dags are now enabled and have completed at least one full execution of 
each dag.
  
  - Increased partition count on map_subgraph_queries to 2048, the largest 
shuffle is ~600GB and this gets the per-executor work down into the desired 
256-512M range.
  - Increased executor memory on map_subgraph_queries from 8g to 12g. Many 
executors were red with >10% of time spent in GC. This often leads to 
intermittent failures that increase when data sizes increase, 12g appears to 
keep most executors out of the red state.
  - Seeing intermittent failures in map_subgraph_queries, usually internal 
spark retries manage to work through it but have seen failures that roll up to 
the airflow retry level. We might want to increase the timeout waiting on 
shufle server if it persists.  Potentially spark addressed this issue in 3.0 
with https://issues.apache.org/jira/browse/SPARK-24355
  - Mentioned to analytics team that we have a few new high-resource jobs 
running. These jobs are all in the `sequential` pool so it shouldn't cause any 
downstream issues, but seems appropriate to let them know.
  - Switched SubgraphQueryMapper from coalesce to repartition. Same reasoning 
as in the weekly dag, the final jobs were giving OOM's and allowing those to 
compute with the full partition count allows it to complete, at the expense of 
requiring an additional shuffle.
  - Removed `wiki=wikidata` from the sparql event partition specification in 
subgraph_and_query_metrics. There is no wiki column in this table, rather it is 
limited to wdqs (TODO: is that true? Can wcqs end up in here?) which is 
implicitly limited to wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, EBernhardson
Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, 
Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, 
Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, 
ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, 
Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to