mpopov added a comment.
> are most people at WMF writing spark pythonically and not with queries? I guess it depends on who you talk to and what they're doing. All of the data scientists/analysts I work with use Spark SQL engine and write HiveQL queries, often because `hive.run` is too slow. Occasionally I see dot notation for advanced PySpark usage (e.g. Morten's survey aggregation data pipeline <https://github.com/nettrom/Growth-welcomesurvey-2018/blob/master/T275172_survey_aggregation.ipynb>). I suspect dot notation-based Spark usage is probably more common among software engineers. TASK DETAIL https://phabricator.wikimedia.org/T342111 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE, mpopov Cc: mpopov, JAllemandou, Lydia_Pintscher, dcausse, Gehel, dr0ptp4kt, AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org