EBernhardson added a comment.

  In T303831#8060472 <https://phabricator.wikimedia.org/T303831#8060472>, 
@AKhatun_WMF wrote:
  
  > In T303831#8058159 <https://phabricator.wikimedia.org/T303831#8058159>, 
@EBernhardson wrote:
  >
  >> the airflow patch is deployed but i only turned on *_init dags and 
subgraph_mapping_weekly today (ran out of time, will do rest tomorrow).
  >>
  >> subgraph_mapping_weekly failed the first time through. I updated executor 
memory from 8g to 12g but the second execution is still failing. something is 
quite unbalanced about the topSubgraphItems, of the 8 shards they have inputs 
varying from 100MB to 450MB giving executions times of ~30s on the small ones 
and ~8m before the final one fails.
  >>
  >> Not specifically related to this patch, but i wonder if we could change up 
the `SparkUtils.saveTables`  method to somehow take parameters in the path to 
specify coalesce vs repartition and the number of partitions to save by, so we 
only have to update the airflow invocation and not the jar as well to test 
variations there.
  >
  > Should we have params called `coalesce`, and `repartition`, and have them 
default to false. And when true, use `num_partitions` to coalesce or 
repartition accordingly?
  >
  > Edit: I realize all arg classes that need to coalesce or repartition will 
need to have these params set.
  
  In this case i was thinking that we could somehow treat the string that is 
provided over the command line as a specification for how/where to store things 
and somehow include named parameters in it. So for example right now we provide:
  
    --all-subgraphs-table discovery.wikibase_rdf/date=20220620/wiki=wikidata
  
  What if instead we could provide (syntax to be bikeshedded):
  
    --all-subgraphs-table 
discovery.wikibase_rdf/date=20220620/wiki=wikidata;repartition=42
  
  This would have the downside that read/write would have different syntaxes 
and we have to know which to use where, maybe there are better options. Mostly 
pondering ideas on how to make things we know might have to be modified easier 
to change.  There are probably other ways to magic parameters into various 
places in the jvm world, this is just a first guess.

TASK DETAIL
  https://phabricator.wikimedia.org/T303831

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AKhatun_WMF, EBernhardson
Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, 
Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, 
Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, 
ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, 
Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to