EBernhardson added a comment.
All dags are now enabled and have completed at least one full execution of each dag. - Increased partition count on map_subgraph_queries to 2048, the largest shuffle is ~600GB and this gets the per-executor work down into the desired 256-512M range. - Increased executor memory on map_subgraph_queries from 8g to 12g. Many executors were red with >10% of time spent in GC. This often leads to intermittent failures that increase when data sizes increase, 12g appears to keep most executors out of the red state. - Seeing intermittent failures in map_subgraph_queries, usually internal spark retries manage to work through it but have seen failures that roll up to the airflow retry level. We might want to increase the timeout waiting on shufle server if it persists. Potentially spark addressed this issue in 3.0 with https://issues.apache.org/jira/browse/SPARK-24355 - Mentioned to analytics team that we have a few new high-resource jobs running. These jobs are all in the `sequential` pool so it shouldn't cause any downstream issues, but seems appropriate to let them know. - Switched SubgraphQueryMapper from coalesce to repartition. Same reasoning as in the weekly dag, the final jobs were giving OOM's and allowing those to compute with the full partition count allows it to complete, at the expense of requiring an additional shuffle. - Removed `wiki=wikidata` from the sparql event partition specification in subgraph_and_query_metrics. There is no wiki column in this table, rather it is limited to wdqs (TODO: is that true? Can wcqs end up in here?) which is implicitly limited to wikidata. TASK DETAIL https://phabricator.wikimedia.org/T303831 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF, EBernhardson Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
