EBernhardson added a comment.
In T303831#8060472 <https://phabricator.wikimedia.org/T303831#8060472>, @AKhatun_WMF wrote: > In T303831#8058159 <https://phabricator.wikimedia.org/T303831#8058159>, @EBernhardson wrote: > >> the airflow patch is deployed but i only turned on *_init dags and subgraph_mapping_weekly today (ran out of time, will do rest tomorrow). >> >> subgraph_mapping_weekly failed the first time through. I updated executor memory from 8g to 12g but the second execution is still failing. something is quite unbalanced about the topSubgraphItems, of the 8 shards they have inputs varying from 100MB to 450MB giving executions times of ~30s on the small ones and ~8m before the final one fails. >> >> Not specifically related to this patch, but i wonder if we could change up the `SparkUtils.saveTables` method to somehow take parameters in the path to specify coalesce vs repartition and the number of partitions to save by, so we only have to update the airflow invocation and not the jar as well to test variations there. > > Should we have params called `coalesce`, and `repartition`, and have them default to false. And when true, use `num_partitions` to coalesce or repartition accordingly? > > Edit: I realize all arg classes that need to coalesce or repartition will need to have these params set. In this case i was thinking that we could somehow treat the string that is provided over the command line as a specification for how/where to store things and somehow include named parameters in it. So for example right now we provide: --all-subgraphs-table discovery.wikibase_rdf/date=20220620/wiki=wikidata What if instead we could provide (syntax to be bikeshedded): --all-subgraphs-table discovery.wikibase_rdf/date=20220620/wiki=wikidata;repartition=42 This would have the downside that read/write would have different syntaxes and we have to know which to use where, maybe there are better options. Mostly pondering ideas on how to make things we know might have to be modified easier to change. There are probably other ways to magic parameters into various places in the jvm world, this is just a first guess. TASK DETAIL https://phabricator.wikimedia.org/T303831 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF, EBernhardson Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
