AndrewTavis_WMDE added a comment.

  > Good to know, this is definitely a lot lower than I expected, thanks!
  
  Welcome!
  
  I guess another question that I have and have been discussing with @Manuel a 
bit: is there a benefit to writing the spark queries in the Python dot notation 
(`data.select(col("subject").alias("schol_art_QID")...`, etc) vs. writing a 
multi line string and passing it to `wmfdata.spark.run`? I find a 
well-formatted multi line query to be much easier to read and explain to 
stakeholders, and in working a bit with the dot notation there doesn't seem to 
be much of a speed increase on it, but then it would be hard for me to judge 
this given commands are being run at times of different server loads.
  
  More generally, are most people at WMF writing spark with pythonically and 
not with queries? If there's need for code review I don't want to force people 
to read something they're not used to, but if there are folks who are writing 
queries and passing them to `spark.run` then I might join that camp 😅 Happy to 
get used to it though if there's even a general benefit or preference :)

TASK DETAIL
  https://phabricator.wikimedia.org/T342111

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE
Cc: JAllemandou, Lydia_Pintscher, dcausse, Gehel, dr0ptp4kt, AndrewTavis_WMDE, 
Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to