AndrewTavis_WMDE added a comment.
> Good to know, this is definitely a lot lower than I expected, thanks!
Welcome!
I guess another question that I have and have been discussing with @Manuel a
bit: is there a benefit to writing the spark queries in the Python dot notation
(`data.select(col("subject").alias("schol_art_QID")...`, etc) vs. writing a
multi line string and passing it to `wmfdata.spark.run`? I find a
well-formatted multi line query to be much easier to read and explain to
stakeholders, and in working a bit with the dot notation there doesn't seem to
be much of a speed increase on it, but then it would be hard for me to judge
this given commands are being run at times of different server loads.
More generally, are most people at WMF writing spark with pythonically and
not with queries? If there's need for code review I don't want to force people
to read something they're not used to, but if there are folks who are writing
queries and passing them to `spark.run` then I might join that camp 😅 Happy to
get used to it though if there's even a general benefit or preference :)
TASK DETAIL
https://phabricator.wikimedia.org/T342111
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: AndrewTavis_WMDE
Cc: JAllemandou, Lydia_Pintscher, dcausse, Gehel, dr0ptp4kt, AndrewTavis_WMDE,
Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE,
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86,
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS,
Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]